Introduction

Recognizing the threat of invasive species transported by shipping, the International Maritime Organization (IMO) adopted the International Convention for the Control and Management of Ships’ Ballast Water and Sediments (BWM: IMO 2004). In force since September 2017, the BWM requires the installation of ballast water management systems (BWMS) that have been granted a Type Approval Certificate by the responsible Administration. The type-approval process includes land-based and shipboard testing, taking into account detailed guidelines (IMO 2016a). These include, among other requirements of the Regulation D-2 Performance Standard, testing to demonstrate the discharge of fewer than 10 viable organisms per milliliter in the size category of ≥ 10 μm and < 50 μm in minimum dimension (hereafter 10–50 μm), defining viable organisms as “organisms that have the ability to successfully generate new individuals in order to reproduce the species” (IMO 2016a). The United States Coast Guard (USCG) has its own standards for ships’ ballast water discharged in US waters (USCG 2012). The numerical limit for 10–50 μm organisms is the same, but the USCG regulates the discharge of living instead of viable organisms.

As discussed extensively elsewhere (First and Drake 2013; Wright and Welschmeyer 2015, and references therein; Cullen and MacIntyre 2016), two fundamentally different methodologies are used to enumerate viable organisms (those capable of reproducing) vs. living organisms (those showing signs of life) in BWMS Type Approval testing. The serial dilution culture-most probable number method (SDC-MPN) can be used to enumerate viable 10–50 μm photosynthetic organisms in ballast water discharge (Wright and Welschmeyer 2015; Cullen and MacIntyre 2016). For Type Approval testing, SDC-MPN enumerations are augmented by microscopic counts of motile, non-photosynthetic organisms; this is the MPN Dilution Culture+Motility method (IMO 2016b). For the enumeration of living organisms, the USCG incorporated by reference a fluorescence-based live/dead test described in the Environmental Technology Verification (ETV) Protocol (USEPA 2010). This method employs the fluorescent probes fluorescein diacetate (FDA) and 5-chloromethylfluorescein diacetate (CMFDA)—commonly referred to as vital stains—to detect membrane integrity, assumed to be a characteristic of living organisms (see MacIntyre and Cullen 2016). Motile organisms that do not fluoresce are also classified as live, so the method can be called FDA/CMFDA+Motility or Stain-Motility (Cullen 2018). In part because their standards are for the discharge of living, not viable organisms, the USCG rejected a proposed MPN Dilution Culture+Motility method as an alternate to their required FDA/CMFDA+Motility method in 2016 (USCG 2016).

In its official guidance, the Marine Environment Protection Committee (MEPC) of the IMO (2017) states that of the two methodologies that may be used to enumerate viable organisms for Type Approval of BWMS—FDA/CMFDA+Motility and MPN Dilution Culture+Motility—only the SDC-MPN-based method is suitable for treatment technologies that are designed to destroy the ability to reproduce rather than to kill. Disinfection with ultraviolet radiation (UV) is one of these technologies (Cullen and MacIntyre 2016). Consequently, systems using UV-disinfection as an endpoint cannot be suitably tested by the USCG or any other administration that does not accept SDC–MPN-based enumerations of viable organisms for BWMS Type Approval. This is in contrast to US regulations for drinking water, which explicitly recognize that inactivation—i.e., destruction of the ability to reproduce, as measured with growth-based assays comparable to SDC–MPN—is the appropriate measure of efficacy for samples treated with UV (USEPA 2006). The IMO Marine Environment Protection Committee (MEPC) guidance remains open for addition of new or revised methodologies as they become available but, at time of writing, none besides MPN Dilution Culture+Mobility is recognized as being applicable to all ballast water treatment technologies (IMO 2017). As recognized for both the FDA/CMFDA+Motility method (USEPA 2010) and SDC–MPN (Cullen and MacIntyre 2016), accurate enumeration of viable but dormant organisms in the 10–50 μm size-class, (e.g., dinoflagellate cysts) is challenging and may require different methods.

The USCG is working with the IMO to harmonize the IMO G8 Guidelines with the US type-approval process (USCG 2016), and this includes renewed consideration of SDC-MPN, based on additional documentation of method verification, either provided by present users of the method, or from an on-going project initiated by the ETV in the USA (see IMO 2016c). However, relevant performance metrics and validation criteria are not specified by IMO. Rather, the MEPC guidance states that “Analytical methodologies should be validated to the satisfaction of the Administration” (IMO 2017).

Detailed and rigorous method validation guidelines exist for widely used applications such as microbiological analysis of foods or environmental testing (Feldsine et al. 2002; Parshionikar et al. 2009), but no comparably structured guidance exists for validating methods that may be used for enumerating viable organisms in the 10–50 μm size range for BWMS Type Approval. A quantitative framework for method validation has been proposed, however (Cullen 2018). The focus is on performance metrics: precision, bias, and limits of detection. Notably, a key source of uncertainty in the enumeration of viable or living organisms—false-positive and false-negative error in the classification of viable or living organisms—cannot be assessed directly in ballast water testing because there is no accepted reference for the concentration of viable or living organisms in natural waters. Although systematic observations can be used to constrain rates of false-positive and false-negative error for samples from nature (e.g., Steinberg et al. 2011; Adams et al. 2014), it is recognized that experiments on actively growing and killed laboratory cultures are required for validation of enumeration methods (see IMO 2016d; MacIntyre and Cullen 2016).

Focusing on Tetraselmis as a target organism for use in validating BWMS, Sun and Blatchley (2017) showed that for actively growing cultures, SDC-MPN results were statistically indistinguishable from hemocytometer counts, suggesting that SDC-MPN returned unbiased enumerations of viable organisms. In turn, estimates for cultures exposed to UV-C radiation in excess of the quantified dose-response relationships were consistently near the method’s limit of detection, well below the regulatory limit of 10 viable organisms mL−1; false-positive results were therefore insignificant. We have estimated concentrations of viable organisms by SDC-MPN on independently replicated cultures of 12 species of phytoplankton in active growth (MacIntyre and Cullen 2016; MacIntyre et al. 2018): The estimates did not differ significantly from counts of total organisms from flow cytometry. Five of these species were also subjected to killing heat treatment: SDC-MPN indicated 98.4–99.998% loss of viability (MacIntyre and Cullen 2016). Key assumptions of the SDC-MPN method have therefore been tested and shown to be valid for cultures of phytoplankton. To date, comparable laboratory studies on the FDA/CMFDA+Motility method have not been published.

Besides demonstrating that a method is suitable for its intended use, it is important to document the reproducibility of results among laboratories as part of method validation (Mishalanie et al. 2016). As these authors note, “A method that proved rugged for use in one laboratory may lose that characteristic when tried in several other laboratories”: Transferability is therefore an important component of validation. For direct inter-laboratory comparisons, this requires testing the same material—an impossibility for natural samples of plankton, but achievable with the use of phytoplankton cultures grown under carefully controlled conditions (MacIntyre and Cullen 2005). Studies on laboratory cultures can thus contribute significantly to the validation of the SDC-MPN base method in support of its evaluation for use in BWMS Type Approval testing.

The objectives of this study were the following:

  1. 1.

    Determine if SDC-MPN can classify actively growing phytoplankton as viable, and heat-killed phytoplankton as not, without bias.

  2. 2.

    Characterize the precision of SDC-MPN results and compare it to the theoretical expectation for perfect adherence to method assumptions.

  3. 3.

    Test for reproducibility of SDC-MPN by comparing results from three laboratories—two of which are new to the method.

This account includes a description of methodological best practices and a comparison of SDC-MPN calculations, including uncertainty estimates and limits of detection. The scope of this study is restricted to validation of the SDC-MPN methodology when used on phytoplankton: These first results would complement the formidable body of research on MPN as applied to other target organisms using distinctly different culturing conditions, e.g., with coliform or marine bacteria (Haas and Heller 1988; Button et al. 1993) or the pathogenic protist Cryptosporidium parvum (Slifko et al. 1999).

Background

The SDC–MPN method

Rooted in methods for enumerating bacteria in sanitary analyses of liquids (McCrady 1915; Cochran 1950), the SDC-MPN method is an established approach for use with multi-species communities of phytoplankton (Throndsen 1978; Furuya and Marumo 1983; Throndsen and Kristiansen 1991; Andersen and Throndsen 2003; Cerino and Zingone 2006) that has more recently been used in the testing of ballast water discharge (Madsen and Petersen 2015; Wright and Welschmeyer 2015; Cullen and MacIntyre 2016) and for quantifying the efficacy of UV radiation as a potential ballast water treatment technology (Sun and Blatchley 2017; MacIntyre et al. 2018). The length of the assay (typically weeks even under optimized conditions, MacIntyre et al. 2018) effectively precludes its use for monitoring installed BWMS for regulatory compliance. Its practical application is in Type-Approval testing.

The principles of the SDC-MPN method (Cochran 1950; Jarvis et al. 2010) are the same, whether for bacteria or phytoplankton. A sample is dispensed into a series of replicate culture tubes in tiers of dilution such that some tubes contain no viable organisms and thus can show no growth. After a suitable incubation (see “Materials and methods,” “Growth assays”), tubes are scored positive for the presence of one or more viable organisms, as demonstrated by detectable growth; otherwise, they are scored negative. The most probable number of viable organisms in the undiluted parent sample is calculated from the number of tubes scoring positive in each tier of the dilutions. The method assumes the following:

  1. 1.

    Organisms are randomly distributed in each tube and evenly distributed between subsamples.

  2. 2.

    Growth will be reliably detected in any tube containing one or more viable organisms.

The first assumption allows the calculation of MPN and its uncertainties based on the expectation of organisms-per-culture-tube from the Poisson distribution (Jarvis et al. 2010). If the second assumption is not satisfied, SDC-MPN will underestimate the concentration of viable organisms in direct proportion to the relative abundance of viable organisms that do not grow to detection under SDC-MPN culture conditions (Cullen 2018).

Metrics of method performance

Accuracy is the closeness of agreement between a test result and the accepted reference value (ISO 1994). It is determined by a combination of random error affecting precision and systematic error causing bias. Precision refers to the closeness of agreement between independent test results obtained under stipulated conditions. It is usually expressed in terms of imprecision, computed, for example, as the standard error of test results. Trueness is defined as the closeness of agreement between the suitably averaged value obtained from a large series of test results and the accepted reference. It is usually expressed in terms of bias. For SDC-MPN, the test result is CMPN (organisms mL−1) and the reference value would be the true concentration of viable organisms, μviable (organisms mL−1). Here, μviable is estimated by measurements of the concentration of actively growing organisms (Cviable, organisms mL−1), and bias is represented by the agreement between multiple measurements of CMPN and Cviable (see “Background,” “Factor of agreement”).

Precision and confidence intervals

The SDC-MPN calculation is a maximum-likelihood estimator of ln(μviable). The estimates, ln(CMPN), follow an approximately normal distribution described by its standard error, \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \). The 95% confidence interval on an estimate is \( \ln \left({C}_{\mathrm{MPN}}\right)-2{\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)},\ln \left({C}_{\mathrm{MPN}}\right)+2{\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \) (Cochran 1950; Jarvis et al. 2010). Correspondingly, the lower and upper 95% confidence limits on an SDC-MPN estimate, LLMPN and ULMPN, respectively, are

$$ {LL}_{\mathrm{MPN}}={C}_{\mathrm{MPN}}/\exp \left(2{\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)}\right),{UL}_{\mathrm{MPN}}={C}_{\mathrm{MPN}}\cdot \exp \left(2{\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)}\right) $$
(1)

As Cochran (1950) explained, the multiplicative factor for the confidence interval, which is represented as \( FCI=\exp \left(2{\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)}\right) \) (Cullen 2018), is an appropriate substitute for the standard error of an estimate.

Factor of agreement

Systematic error in the SDC-MPN estimate, for example from failure to detect viable organisms (false-negative results), leads to bias that can be detected in the distribution of the factor of agreement (FOA), CMPN/Cviable. False-negative results lead to FOA < 1.0 and false-positive results lead to FOA > 1.0. For any μviable, estimates of CMPN follow a log-normal distribution with a variance that greatly exceeds that for Cviable (see “Results,” “Precision”). Consequently, the distribution of ln(FOA), i.e., ln(CMPN/ Cviable), is suitable for testing the hypothesis that SDC-MPN results are biased, i.e., ln(FOA) ≠ 0 (Cullen 2018).

Limit of detection

The upper confidence limit for a uniformly negative SDC-MPN result can also be considered the lower limit of detection (LLDMPN) for that analytical configuration (Cullen 2018). (The analytical configuration refers to the tube volume, number and ratio of tiers of dilution, dilution in the upper tier of tubes, and the number of tubes per tier.) Jarvis et al. (2010) calculate the limit using a two-tailed, 0.025 probability for the highest concentration of target organisms that could return zero detects—no tubes in any dilution testing positive for growth—reporting the SDC-MPN estimate as zero with the upper confidence limit of ULMPN. In contrast, the MPN table from the US Food and Drug Administration, USFDA, reports the MPN for a uniformly-negative result as less than the lowest MPN for an outcome with at least one positive tube (Blodgett 2005, 2010); the value is not called a limit of detection. Because their treatment has an explicitly described mathematical foundation, SDC-MPN estimates for uniformly negative results will be reported following Jarvis et al. (2010), and the upper bound will also be considered the LLDMPN.

Materials and methods

Organization and communication

The experiments were performed at Dalhousie University (DAL), where the base methodology was developed (see MacIntyre et al. 2018), the Bigelow Laboratory for Ocean Sciences (BLOS), and the University of South Carolina (U-SC). Standard operating procedures for maintaining and harvesting cultures and assessing viability by SDC-MPN were developed in a workshop held at DAL prior to the experimental phase. The research laboratories at BLOS and U-SC were experienced in the culture of phytoplankton for experimentation, but new to SDC-MPN. Communications were archived on an electronic bulletin board, and all data were loaded to a common archive.

Culture maintenance

Cultures of the prasinophyte Pyramimonas parkeae Norris et Pearson (Strain CCMP725), the diatom Thalassiosira weissflogii (Grun.) Fryxell et Hasle (Strain CCMP1050), and the dinoflagellate Amphidinium carterae Hulbert (Strain CCMP1314) were obtained from the National Center for Marine Algae and Microbiota (NCMA; East Boothbay, ME, USA). Each species is unicellular, so organisms and cells are assumed to be equivalent measures. The cultures were grown in each laboratory on f/2 + Si growth medium (Guillard 1973), using coastal seawater as a base. The medium was sterilized by filtration through a 0.2-μm Whatman Polycap AS filter cartridge (BLOS) or a 0.2-μm Pall Gelman Culture Capsule filter (DAL and U-SC). All glassware and tubing with which the cultures made contact were cleaned by soaking in 2–5% HCl, followed by copious rinsing with E-Pure water (Barnstead Nanopure/APS Water Services Corporation, Lake Balboa, CA, USA) or an equivalent, and were sterilized by autoclaving. All transfers were done in a laminar flow hood with 0.2-μm filtered air. All surfaces were wiped with 95% ethanol prior to transferring cultures, and the tubes were flamed on opening and closing to minimize the risk of contamination.

The cultures were maintained in semi-continuous growth (MacIntyre and Cullen 2005) in 50-mL Pyrex tubes closed with either unlined HDPE caps or PTFE-lined phenolic caps, on a 12:12 light/dark cycle at 20 ± 0.5 °C and at a growth irradiance of 100 μmol photons m−2 s−1 PAR supplied by fluorescent bulbs (General Electric Ecolux F34CW-RS-WM-ECO at BLOS; Osram FL40SS-W/37 at DAL; Sylvania Octron/Eco 32 W at U-SC), as monitored by thermistors and light meters in each laboratory.

To monitor growth of the cultures noninvasively, i.e., without compromising the sterility of the culture (Brand et al. 1981), dark-acclimated chlorophyll fluorescence (F, arbitrary fluorometer units) was measured daily, 3–5 h into the dark period, by inserting the culture tube directly into the sample compartments of a Turner 10-AU fluorometer (Turner Designs, USA).

The cultures were diluted every one to five generations with fresh medium to maintain them in exponential growth at low optical density. The specific growth rate, μF (day−1), was calculated from the between-day increase in dark-acclimated fluorescence (Parkhill et al. 2001; Wood et al. 2005):

$$ {\mu}_{\mathrm{F}}=\frac{1}{t_2-{t}_1}\cdot \ln \left(\frac{F_2}{\left[1-d\right]\cdot {F}_1}\right) $$
(2)

where t1 and t2 are the times (day) of successive observations, F1 and F2 (Arb.) are the corresponding blank-corrected fluorescence obtained from the Turner 10-AU, and d (dimensionless) is the dilution (proportion of culture replaced with fresh medium) between readings.

Physiological status of the cultures was monitored by measuring variable chlorophyll a fluorescence using a FASTOcean Sensor FRRF (Chelsea Technologies Group, UK) at BLOS, a benchtop FIRe (Fluorescence Induction and Relaxation system, Satlantic, Canada) at DAL, and the Turner 10-AU with/without 3-(3,4-dichlorophenyl)-1,1-dimethylurea (DCMU) at a final concentration of 40 μmol L−1 (Parkhill et al. 2001) at U-SC. Estimates were made non-destructively by inserting the culture tube into a custom-built cuvette holder (DAL) or by destructive analysis of sub-samples removed from the culture tubes (BLOS and U-SC). Blanks (culture medium) and standards were measured daily at DAL and U-SC. The FIRe was standardized with a 200 μmol L−1 solution of rhodamine b (R6625, Sigma-Aldrich) in E-Pure water. The FASTocean Sensor FRRF (BLOS) is calibrated at the manufacturer annually: No daily standard was used.

The maximum quantum efficiency of Photosystem II transport, Fv/Fm, based on multiple-turnover flashes, was calculated using the manufacturers’ software, FastPro8, for data collected on the FRRF (BLOS) and as [FDCMU − F]/FDCMU, where F and FDCMU are blank-corrected fluorescence measured before and after addition of DCMU (U-SC). For the FIRe (DAL), the minimum, maximum, and variable fluorescence (F0, Fm, Fv) and the dimensionless ratio Fv/Fm were obtained by non-linear fitting of the single-turnover fluorescence induction curve using Fireworx software (Audrey Ciochetto, née Barnett, http://sourceforge.net/projects/fireworx/). Curve fits were performed with the curvature parameter in the fit, p, set to zero.

Harvesting and heat treatment

The cultures were assumed to be in balanced exponential growth and ready for experimental use when the coefficients of variation in μF and Fv/Fm were less than 10% over 10 generations (MacIntyre et al. 2018). Cultures were harvested 4–6 h after the start of the dark period. Each culture was divided into clean 7-mL glass tubes prior to setting up the growth assays, and the subsamples were either left untreated or immersed in a water bath at 50 °C for 10 min. The heat-treated organisms are assumed to be dead (MacIntyre and Cullen 2016). A sensitivity analysis supporting the assumption is presented in the Supplementary Materials, “The 10-generation criterion for balanced growth.”

Growth assays

The concentrations of organisms in the untreated samples were measured by flow cytometry using a FACScan (Becton Dickinson, USA) at BLOS, an Accuri C6 (Becton Dickinson, Canada) at DAL, and Guava easyCyte (Millipore Sigma, USA) at U-SC. Optical alignment in the FACScan (BLOS) was validated weekly using 3 μm Rainbow Calibration Particles (Spherotech, Inc., Lake Forest, IL, USA). The Accuri C6 and Guava easyCyte were calibrated with validation beads prior to each use (Becton Dickinson Spherotech 8-peak at DAL; GuavaCheck Bead Reagent at 51,000 beads mL−1 at U-SC). Organism concentrations were estimated by counting events in a clearly defined cluster in a log-log bi-plot of chlorophyll a fluorescence (Ex. 488 nm; Em. > 670 nm) vs. side scatter after gating to exclude the background levels observed in distilled water or filtered seawater. For each sample, 150–300 μL of media were analyzed, counting c. 1000–15,000 organisms. At BLOS, sample volume was determined gravimetrically using a microbalance (Denver Instrument, USA). At DAL and U-SC, the instrument settings were assumed to deliver accurate sample volumes. Due to the mathematical certainty that non-reproductive organisms are eliminated during successive culture dilutions (see Supplementary Materials, “The 10-generation criterion for balanced growth”), organisms in these actively growing cultures are assumed to be uniformly viable and their measured concentrations are represented as Cviable (organisms mL−1).

The concentration of viable organisms in both untreated and heat-killed cultures was estimated using the SDC-MPN growth assay (McCrady 1915; Cochran 1950; Throndsen 1978), with growth assessed from changes in dark-adapted chlorophyll a fluorescence using “Best Practice” criteria described by MacIntyre et al. (2018). The cultures were diluted in 5-mL volumes in sterile Pyrex tubes (five per dilution) in three tiers separated by an order of magnitude (10−4, 10−5, 10−6 for untreated cultures; 100, 10−1, 10−2 for heat-killed cultures). The dilution ranges for untreated cultures were based on predicted concentrations of viable organisms (MacIntyre et al. 2018), with the starting assumption that all organisms in the untreated culture were viable. Use of undiluted culture for the top tier of the assay with heat-killed cultures gives the maximum sensitivity in detecting viable cells.

The dilution-series tubes were incubated under the same conditions of irradiance, temperature, and day length as the preliminary cultures. Chlorophyll fluorescence (F) was measured with the Turner 10-AU fluorometers every 48 h. Tubes were scored as positive if fluorescence increased 8× above the fluorescence corresponding to the lower limit of detection (LLDF) of the fluorometer or 8× above the initial fluorescence, whichever was higher (MacIntyre et al. 2018). The lower limit of detection was determined for each species on each fluorometer using a dilution series of the exponentially growing culture (Anderson 1989; Miller and Miller 2005, see Supplementary Materials).

To minimize the risk of false negatives—tubes that should exhibit growth but in which growth was not observed because they were not monitored for long enough—tubes were only scored as negative for growth if no increase in fluorescence had been observed at tend, an interval based on the growth responses of the least-diluted tier of the exponentially growing culture. This is illustrated in Fig. 1. The duration (day) of tend is defined (Eq. 6a in MacIntyre et al. 2018) as

$$ {t}_{\mathrm{end}}=\left(3\cdot \ln (2)-\ln \left(\frac{10^{X-x}\cdot {F}_{\mathrm{init}}}{LLD_{\mathrm{F}}}\right)\right)\cdot {\mu}_{\mathrm{F}}^{-1} $$
(3)

where Finit (Arb.) and μF (day) are the intercept and slope, respectively, of a linear regression of ln-transformed fluorescence on time for tubes exhibiting growth in the least-diluted tier of the exponentially growing culture (Fig. 1a); and x = X, X-1, or X-2 for the dilution series 10X, 10 X-1, and 10 X-2. The underlying assumption is that the time for fluorescence to reach LLDF in a tube exhibiting growth can be predicted by assuming that the initial fluorescence, Finit, varies with dilution and that the growth rate (μF) is constant across dilutions and treatments. The first term in the expression in Eq. 3 is an additional three-generation buffer added to the estimate to account for any lag in growth (MacIntyre et al. 2018).

Fig. 1
figure 1

Time courses of in vivo fluorescence in three serial dilutions of an exponentially growing (ac) and heat-killed (df) treatments of a culture of Amphidinium carterae. There were five replicates per dilution. Note the log scales. Tubes in which fluorescence increases above 8× the lower limit of detection (LLDF) are scored as positive for growth. a The dashed line (“Fitted”) is an extrapolation of the exponential growth phase, determined by regression of ln-transformed fluorescence on time for points above LLDF (see text for details). The intercept is Finit and the slope is μF. bf The dashed lines (“predicted”) are the expected time courses of fluorescence for tubes that are sequential dilutions of (a) or are sequential dilutions (e, f) of a separate heat-killed subsample (d). The prediction assumes that the viable fraction of the culture, determined by the ratio of dilution relative to (a), would grow at the same rate. The lower limit for the prediction (cf) is calculated for the case where a tube contains a single cell (see text for details). The shaded bars indicate a three-generation period beyond the time at which fluorescence is predicted to exceed LLDF. Tubes without an increase in fluorescence at the end of this period (tend, right margin of the gray bar) are scored as negatives. The final MPN score in the exponentially growing sample is 5–4–2, and the score in the heat-killed sample is 0–0–0

The lower limit of the initial fluorescence, Finit,min, corresponds to the signal from a single organism in the tube. This is calculated as the ratio of Finit to the mean initial number of viable organisms in the tubes from which it was determined. In practice, Finit,min was determined from the intercept of a regression for the least-diluted tier of tubes from the untreated culture (Fig. 1a) normalized to the total cell number in the tubes. Total cell number was calculated as the product of Cviable, the volume of the tubes, and their dilution.

$$ {F}_{\mathrm{init},\min }=\frac{F_{\mathrm{init}}}{C_{\mathrm{variable}}\cdot V\cdot {10}^X} $$
(4)

where Cviable (organisms mL−1) is the concentration of organisms in the parent culture, V is the volume of culture in the assay (here, 5 mL), and 10X (dimensionless) is the dilution in the tubes from which Finit was estimated (here, 10−4).

The starting value of fluorescence for the prediction of growth was based either on the product of Finit and the dilution ratio between the tier being observed and the tier on which Finit was estimated (e.g., Fig. 1b) or on Finit,min, whichever was higher (e.g., Fig. 1c). Predicted growth for all dilutions of the heat-treated cultures was based on Finit,min (Fig. 1d–f), as it defines the limit of tend.

Calculations

At tend, results for the SDC-MPN assays were scored to indicate the number of positive scores out of five replicates in each of the three tiers of the assay (e.g., 5–4–2, Fig. 1a–c). Most Probable Number estimates of the concentration of viable organisms (CMPN, organisms mL−1), the standard deviation of log10(CMPN), the 95% confidence intervals on CMPN, and rarity values—all from Jarvis et al. (2010)—were generated using a spreadsheet (Wilrich 2017). Results were compared with those from a USFDA table (Blodgett 2010), calculated as described by MacIntyre et al. (2018). Comparisons with the results from other calculators are presented in Table S1 (Supplementary Materials).

Confidence limits from each calculation method were converted to the standard error of ln(CMPN) using the relationship, \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)}=\ln \left({UL}_{\mathrm{MPN}}/{C}_{\mathrm{MPN}}\right)/ \)2 (see Eq. 1). The factor of agreement for each MPN determination is CMPN/Cviable. Due to equipment failure, the three measurements of Cviable for Pyramimonas parkeae could not be obtained at U-SC, so they are excluded from the comparisons of CMPN with Cviable.

Results

Enumeration of viable and heat-killed organisms

The concentration of organisms obtained from flow cytometry, Cviable, was within the 95% confidence intervals for the SDC-MPN estimate, CMPN, for all tests on actively growing cultures from each of the three laboratories (Fig. 2, Table 1). This is consistent with the requirement for a regulatory test that SDC-MPN reliably enumerates viable organisms. More categorically, in every case, no growth was detected in tubes of the heat-treated samples (e.g., Fig. 1d–f): For scores of 0–0–0 (dilutions of 1, 10−1, and 10−2), CMPN = 0, and ULMPN = 0.13 organisms mL−1. These results correspond to relative viabilities (cf. MacIntyre et al. 2018) of 2.9 × 10−6 to 2.1 × 10−5, or less, at the cell concentrations of the cultures (Table 1).

Fig. 2
figure 2

Estimates of the concentrations of viable organisms by flow cytometry (Cviable) and by SDC-MPN (CMPN). Independent replicates (1–3) were assayed for three species: the dinoflagellate Amphidinium carterae, the prasinophyte Pyramimonas parkeae, and the diatom Thalassiosira weissflogii. Assessments were made at Bigelow Laboratory for Ocean Sciences (BLOS), Dalhousie University (DAL), and the University of South Carolina (U-SC). The dashed line (LLDMPN) is the limit of detection for this configuration of the test, 0.13 organisms mL−1 (see text for details). Error bars are 95% confidence intervals

Table 1 Results of SDC-MPN for cultures of three species of unicellular phytoplankton maintained in exponential growth, ensuring that essentially all organisms measured with flow cytometry (Cviable, organisms mL−1) are viable. The MPN estimates of viable organisms (CMPN, organisms mL−1) and their 95% confidence limits (LLMPN, ULMPN) are from Jarvis et al. (2010). Scores are the number of positive tubes from five replicates each at dilutions of 10−4, 10−5, and 10−6, respectively. The standard error of the logarithm of the estimate, \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)}=\ln \left({UL}_{\mathrm{MPN}}/{C}_{\mathrm{MPN}}\right)/2. \) The factor of agreement, FOA = CMPN/Cviable. Corresponding SDC-MPN estimates for heat-killed cultures were all the same: scores = 0–0–0, CMPN = 0 and ULMPN = 0.13 organisms mL−1

MPN estimates from different calculators

The 27 assays of untreated cultures (Table 1) yielded 11 unique estimates of CMPN (Table S1): For 9, the estimated MPNs (Jarvis et al. 2010) were identical to calculations from the USFDA table (Blodgett 2010), rounded to two significant digits for comparison. In the two remaining instances, the differences were small (− 1.0 and + 2.3%, see Supplementary Materials, Table S1) and within rounding error. Further comparisons using an additional three calculators revealed only two more discrepancies, of + 4 and − 1.4% (see Supplementary Materials, Table S1). These deviations are minuscule compared to the confidence intervals of the estimates (cf. Table 1).

Estimates of uncertainty in the MPN calculations

The mean of 27 estimates of \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \) in Table 1 is 0.535 (median 0.536; first and third quartiles 0.498 and 0.571). This corresponds to a multiplicative 95% confidence interval, FCI = 2.9. Using the USFDA table, the mean is 0.507 (median 0.557; first and third quartiles 0.428 and 0.562), which is not significantly lower according to a pooled t test (t-statistic = 1.698, df = 52, p = 0.094). The calculated FCI = 2.76 is 6% smaller than the Jarvis et al. (2010) estimate. Among four other calculators, deviations, represented as (FCImethod i − FCIJarvis)/FCIJarvis, ranged from − 26 to + 27% for individual estimates. The median differences ranged from 0 to − 6.7%, (see Supplementary Materials, Table S1).

Differences between uncertainty estimates have been examined extensively—see Jarvis et al. (2010) as a point of entry to the literature. We report the Jarvis et al. (2010) confidence intervals in Table 1, but note that they have no bearing on our analyses of method precision and bias in the inter-laboratory comparison (see “Results,” “Precision” and “Bias”). In turn, only the enumerations of viable organisms, not their confidence intervals, are used in BWMS Type Approval testing (see Cullen 2018).

Limits of detection

For the sample configuration used here to assay heat-killed cultures, LLDMPN = 0.13 organisms mL−1, the upper confidence limit for a negative result, based on a two-tailed, 0.025 probability (Jarvis et al. 2010). Applying a one-tailed test (p = 0.05), the LLD is 0.11 organisms mL−1 (Cullen 2018). The USFDA table (Blodgett 2010) reports the lowest MPN for an outcome with at least one positive tube (0–0–1); the result, CMPN < 0.36 organisms mL−1, is not called a limit of detection. For this study, the mathematically defined ULMPN of 0.13 organisms mL−1 from Jarvis et al. (2010) represents the LLD for the SDC-MPN tests on heat-killed cultures. This corresponds to < 1 viable cell per 5-mL tube, confirming that the method should detect any survivor with a probability of > 97.5%.

The dilution ranges for untreated cultures were based on predicted concentrations of viable organisms (MacIntyre et al. 2018) so that the upper limit of detection, a score of 5–5–5, was not reached. If growth had been detected in all tubes, as can happen with sampling of natural phytoplankton communities (see Cullen 2018), Jarvis et al. (2010) would report the result as CMPN = ∞ and LLMPN = 130,000 viable organisms mL−1 for this test configuration. For the same result, the USFDA table (Blodgett 2010) returns CMPN > 320,000 (the result for 5–5–4) and LLMPN = 140,000 viable organisms mL−1. These lower confidence limits for a result of all positive tubes are useful even though the test is saturated because in BWMS Type Approval, testing is only considered valid if densities of viable organisms in uptake water and control discharge water exceed minimum standards. Consequently, a result of LLMPN exceeding the minimum confirms compliance with the requirement at p = 0.975.

Precision

If the base assumptions of the method are satisfied (see Introduction), the precision of the MPN estimate is primarily a function of number of tubes per dilution tier and the dilution ratio, described by functions with a range of complexity (Cochran 1950; Hurley and Roscoe 1983; MacIntyre et al. 2018). For the configuration used in this study—five tubes per tier with a 10× dilution ratio—the estimated \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \) is 0.55, from a generalized equation simplifying the results of Hurley and Roscoe (1983), as presented by MacIntyre et al. (2018). The result differs little from the mean error in Table 1 (0.535) or Cochran’s (1950) estimate of 0.57.

Results from this laboratory validation were used to test the hypothesis that the ideal, theoretical precision of MPN was degraded in practice by the introduction of additional random measurement error during the tests. Following Cullen (2018), the metric for comparison is the logarithm of FOA, i.e., ln(CMPN /Cviable). Independent of measurement bias that would lead to deviations from the expected mean of ln(FOA) = 0, the standard error of ln(FOA) reflects the contributions of random error in each measurement. By propagation of uncertainty, for ln(FOA) = ln(CMPN) − ln(Cviable):

$$ {\sigma}_{\ln \left(\mathrm{FOA}\right)}=\sqrt{{\sigma_{\ln \left({C}_{\mathrm{MPN}}\right)}}^2+{\sigma_{\ln \left({C}_{\mathrm{viable}}\right)}}^2} $$
(5)

In this study, the expected \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \) is taken as 0.55. This corresponds to a multiplicative factor for 95% confidence intervals of FCI = 3.0. By propagation of error (Bevington 1969), the standard error of ln(Cviable) is estimated as the coefficient of variation (CV) of Cviable, which we assume to be 3% based on triplicate assessments (i.e., analytical replicates) of samples of Thalassiosira weissflogii. The precision of MPN determinations in this study is thus predicted to be \( \sqrt{0.55^2+{0.03}^2} \) = 0.55. The observed precision would be worse (σln(FOA) > 0.55) if experimental procedures introduced additional random measurement error.

The eight sets of triplicate determinations of FOA in Table 1 provided unbiased estimates (sn-1, n = 3) of the standard error of the estimates of ln(CMPN/Cviable). The mean of the standard deviations was 0.47 (range = 0.26–0.61, n = 8), and the Shapiro-Wilk statistic for normality was 0.86 (p = 0.12, from scipy.stats.shapiro in Python): Therefore, the null hypothesis that the observations came from a normal distribution could not be rejected. With caution, we calculated a mean s.d. of ln(FOA) of 0.47, with a 95% confidence interval of 0.38–0.56 (scipy.stats.bayes_mvs). The upper confidence limit corresponds to FCI = 3.06. This result indicates that implementation of SDC-MPN in the inter-laboratory validation introduced little or no additional measurement uncertainty beyond the expected FCI = 3.0 inherent to the method as applied.

Although Pyramimonas was tested at U-SC, breakdown of the flow cytometer prevented calculation of relative viability. Results for the number of viable organisms in exponentially growing cultures were well within the range of those for the other laboratories. As with all other tests on heat-killed organisms, the U-SC scores for Pyramimonas were uniformly 0–0–0.

Bias

The distribution of the logarithm of the FOA (Fig. 3) represents both random measurement error and systematic error (bias) in the estimation of the concentration of viable organisms (Cullen 2018). For the 24 paired determinations of CMPN and Cviable from the validation, the mean ln(CMPN/Cviable) was − 0.030, not significantly different from 0. A two-way analysis of variance showed no significant influence of laboratory or species at p = 0.05, indicating that the pooled result was not compromised by counteracting biases (see Supplementary Materials, “Test for bias in enumeration of viable organisms”). As expected if there were no significant influences of species or laboratory on the mean ln(FOA), the standard deviation for the 24 determinations of ln(CMPN/Cviable) combined was 0.47, the same as the estimate for method precision alone (see “Results,” “Precision”). This is not significantly different from the theoretical expectation of σln(FOA) = 0.55 (chi-square test for variances statistic = 17.13 with 23 df, p > 0.05). These analyses provide no basis to reject the hypothesis that MPN determinations were unbiased estimators of the concentration of viable organisms and that method precision was equivalent to the theoretical ideal.

Fig. 3
figure 3

Factor of agreement (log-transformed) between MPN estimates of the concentration of viable organisms (CMPN) and the concentration of viable organisms measured with flow cytometry on exponentially growing cultures (Cviable). The mean is − 0.030 (s.e., 0.097, n = 24), not significantly different from 0, the expectation for unbiased estimation of viable organisms with MPN. The s.d. is 0.47. The expectation for an ideal implementation of SDC-MPN in this configuration is mean = 0.0 with an s.d. of 0.55 (see “Results,” “Precision”), shown with the solid line. The line was scaled by binning a Gaussian curve with s.d. = 0.55, centered on zero, in intervals of 0.25 (the bin size in the histogram) between − 2.5 and 2.5. A weighting factor was then calculated as 24 (the number of observations) normalized to the sum of bin values for the Gaussian curve, to weight the curve appropriately for the sample size

Sensitivity to the detection threshold

The criterion for scoring positive growth was set as an increase in fluorescence 8× above the lower limit of detection, LLDF (Fig. 1). This is equivalent to three generations for cultures in balanced growth (i.e., where the relationship between F and organism concentration is constant). Where there is no reason to suspect that the relationship varies, lower thresholds for scoring positive growth are appropriate. The SDC-MPN assays were re-scored using thresholds that corresponded to 2× and 4× LLDF. In all cases, there was no change in the scoring (not shown).

Discussion

In its guidance on methodologies that may be used for enumerating viable organisms for Type Approval of ballast water management systems, the Marine Environment Protection Committee of the IMO states that the methodologies “should be validated to the satisfaction of the [type-approval granting] Administration” (IMO 2017). But, criteria for validating enumeration methodologies are not well established, and gaps can be identified in published documentation of the efficacy of both the FDA/CMFDA+Motility and MPN Dilution Culture+Motility methodologies (Cullen 2018). Mindful that the USA has questioned the validation of SDC-MPN, the underpinning of the MPN Dilution Culture+Motility methodology, and recognizing that additional scientific evidence is required for any Administration to evaluate the method, we embarked on this inter-laboratory validation study. The objective was to test whether the assay provided unbiased estimates of viable organisms, whether the precision of the method conformed to the theoretical value, and to test whether or not it was reproducible between facilities and operators when based on a common set of SOPs.

Unbiased enumeration of viable organisms

As with vital stains methods, the accuracy of SDC-MPN cannot be measured directly on natural samples of plankton because there is no accepted reference for μviable, the true concentration of viable organisms. In particular, it is not possible to assume that all organisms in an untreated natural sample are viable. Consequently, experiments on actively growing and killed cultures are an integral part of BWMS-testing method validation (IMO 2016d; MacIntyre and Cullen 2016). Stringent procedures ensured that cultures were in balanced, exponential growth, guaranteeing that essentially, all organisms in untreated cultures were viable (see Supplementary Materials, “The 10-generation criterion for balanced growth”). The results of this study are fully consistent with the hypothesis that the SDC-MPN method enumerated all viable organisms, as has been demonstrated for 13 species of phytoplankton from 7 divisions (MacIntyre and Cullen 2016; Sun and Blatchley 2017; MacIntyre et al. 2018). Consistent, negative results for heat-killed cultures confirmed that the method is not subject to false-positive error, as previously demonstrated for six species in culture (MacIntyre and Cullen 2016; Sun and Blatchley 2017).

The FOA analysis (see “Results,” “Bias” and Fig. 3) confirmed that there was no detectable bias in the estimation of viable organisms with SDC-MPN. Analysis of variance failed to detect counteracting biases among species or laboratories that might not be detectable in pooled data. Addressing our first objective, it can be concluded that SDC-MPN, the base method of the MPN Dilution Culture+Motility method, classified actively growing phytoplankton as viable, and heat-killed phytoplankton as non-viable, without bias.

In contrast, an examination of the base assumption of the FDA/CMFDA+Motility method showed that fewer than half of the 24 species studied were accurately classified as live vs. dead when quantitative measurements of stain fluorescence of exponentially growing and heat-killed cultures of phytoplankton were compared (MacIntyre and Cullen 2016). Notably, no comparable culture-based studies were included in the validation of the FDA/CMFDA+Motility method, as described by the USA (Steinberg et al. 2011; IMO 2016d), and no other directly relevant studies have been published.

Expected vs. observed precision

The imprecision inherent to the SDC-MPN method is well-known (Cochran 1950)—multiplicative confidence intervals of FCI = 2 to 3 are typical for configurations that are used in BWMS testing (MacIntyre et al. 2018). Considering the outcome of single tests, this uncertainty from random measurement error compares poorly to FDA/CMFDA-Motility, especially if the samples are concentrated prior to microscopic examination, which improves the counting precision (e.g., 95% CI = ± 15%, Cullen 2018). But, as documented by Cullen (2018), the BWMS type-approval regime, which has a “one strike and you’re out” requirement for five consecutive successful test results, reverses conventional interpretations of precision and testing efficacy: The method with wider confidence limits is more protective because five consecutive false passes are highly unlikely and the greater risk of a single false failure imposes a margin of safety on the outcome five tests.

In the context of this study, theoretical estimates of method precision (Cochran 1950; Hurley and Roscoe 1983; Jarvis et al. 2010) provide useful benchmarks against which to compare results for three species of phytoplankton enumerated in each of three laboratories. An appropriate metric is the FOA, CMPN/Cviable, log-transformed to accommodate the log-normal distribution of estimates of CMPN. If experimental procedures introduced additional random measurement error into the MPN estimates, the average of the standard deviations of ln(FOA) would be greater than the theoretical σln(FOA) = 0.55. The observed standard deviation of ln(FOA), 0.47, was not; the upper confidence limit of this estimate of \( {\sigma}_{\ln \left({C}_{\mathrm{MPN}}\right)} \) was only marginally higher at 0.56 (see “Results,” “Precision”). Addressing our second objective, we conclude that as implemented in this study, the precision of the SDC-MPN method was not detectably degraded from the theoretical ideal.

Reproducibility

The “Best Practices” SDC-MPN method was developed at DAL, and an SOP appropriate for inter-laboratory comparison was prepared during a workshop with BLOS and U-SC, which implemented MPN for the first time during this study. Addressing the third objective of this study, we found that in the hands of experimentalists from these three laboratories with established competence in phytoplankton culture, the SDC-MPN method enumerated viable phytoplankton with no evidence of systematic error (bias) and with no degradation of precision compared to the theoretical ideal.

Test criteria

The SOP developed for this comparison was based on stringent criteria designed to prevent false-negative and false-positive results in cultures that had been treated with UVC radiation and which were assayed in conditions that differed from those to which they had been acclimated (MacIntyre et al. 2018). As such, they had to account for both variability in the relationship between chlorophyll a fluorescence and cell concentration and, potentially for a lag in which organisms recovered from treatment prior to the resumption of growth in the assay (cf. Hull et al. 2017). The “Best Practice” criteria required an 8× increase in fluorescence to account for acclimation to different conditions of growth and temperature and to allow for recovery from any bleaching resulting from UVC exposure. The criteria also specified a five-generation buffer in defining the duration of monitoring, tend (MacIntyre et al. 2018).

These criteria are unnecessarily stringent for the experiments described here, in which exponentially growing cultures were acclimated to and then assayed under the same growth conditions. Lack of acclimative change to growth conditions or recovery from an imposed stress translated to equally accurate scoring of positives or negatives with relaxed stringency. There was no difference in the scores if the criterion for a positive score was set at the equivalent of one- or two-generation increases (2× and 4× increases in fluorescence) rather than the original three-generation increase (8× increases in fluorescence). As discussed previously (MacIntyre et al. 2018), accepting a lower threshold for defining a positive score and/or reducing the duration of the buffer in calculating tend (Eq. 3) can reduce the commitment of personnel time for the assay.

Conclusions

The MPN Dilution Culture+Motility methodology is one of two that may be used to enumerate viable organisms in the 10–50-μm size range for BWMS Type Approval, but it is the only methodology suitable for treatment technologies that are designed to destroy the ability to reproduce rather than to kill. The base method, SDC–MPN, has been used for decades to enumerate phytoplankton in natural communities; only more recently has it been used to quantify a reduction in the number of viable organisms after exposure to environmental stressors, disinfection, or treatment with BWMS (Lehr et al. 2007; Wright and Welschmeyer 2015; MacIntyre et al. 2018). The last requires validation for use in Type-Approval testing. Inter-laboratory comparison is an important component of method validation, and for enumeration of viable or living phytoplankton, this requires testing on laboratory cultures grown under carefully controlled conditions. Three research laboratories implemented the same experimental procedures to enumerate viable organisms using appropriate reference material—uniformly viable and heat-killed cultures of three species of unicellular microalgae—and including explicit evaluation of precision, bias, and detection limits. Results were not significantly different from the expectation of unbiased enumeration of viable organisms with no degradation of precision from the theoretical ideal. Reproducibility was demonstrated even though the SDC-MPN protocol was implemented by two of the laboratories for the first time in this study. These results might not surprise practitioners, but they should not be taken for granted in the process of method validation. This study is, to our knowledge, the first for a method used to enumerate viable organisms in the 10–50-μm size range for BWMS Type Approval.