Biodiversity and Conservation

, Volume 25, Issue 10, pp 1899–1920 | Cite as

Community ecological modelling as an alternative to physiographic classifications for marine conservation planning

  • Emily M Rubidge
  • Katie S. P. Gale
  • Janelle M. R. Curtis
Original Paper


Accurate mapping of marine species and habitats is an important yet challenging component of establishing networks of representative marine protected areas. Due to limited biological data, marine classifications based on abiotic data are often used as surrogates to represent biological patterns. We tested the surrogacy of an existing physiographic marine classification using non-metric multidimensional scaling and permutational analysis of variance to determine whether species composition was significantly different among physiographic units. We also present an alternative ecological classification that incorporates biological and environmental data in a community modeling approach. We use data on 174 species of demersal fish and benthic invertebrates to identify mesoscale biological assemblages in a 100,000 km2 study area in the northeast Pacific Ocean. We identified assemblages using cluster analysis then used a random forest model with 12 environmental variables to delineate mesoscale ecological units. Our community modelling approach resulted in five geographically coherent ecological units that were best explained by changes in depth, temperature and salinity. Our model showed high predictive performance (AUC = 0.93) and the resulting ecological units represent more distinct species assemblages than those delineated by physiographic variables alone. A strength of our analysis is the ability to map model uncertainty to identify transition zones at unit boundaries. The output of this study provides a biotic driven classification that can be used to better achieve representativity in the MPA planning process.


MPA network Ecological representation Random forest Cluster analysis IndVal 


Biodiversity is rapidly declining as human activities drive global-scale species losses and ecosystem changes (Pimm et al. 2014; Ceballos et al. 2015). Globally, less than 3.5 % of the marine environment currently benefits from protection, compared to 15.4 % of the terrestrial environment (Juffe-Bignoli et al. 2014). Overexploitation, habitat loss, pollution, invasive species expansion and climate change threaten to break down the social, ecological and economic benefits society derives from the world’s oceans (Worm et al. 2006). In response to the increasing threats to biodiversity, the Convention on Biological Diversity (CBD) called upon member states to protect at least 10 % of representative coastal and marine areas, emphasizing those areas of particular importance for biodiversity and ecosystem services (CBD 2010). Implementing an effective network of representative marine protected areas (MPAs) helps to achieve this biodiversity target while conferring ecosystem resilience to environmental change.

In order to design an effective MPA network, several criteria must be met including ecological representativity, connectivity, and the protection of vulnerable habitats and species (e.g., Airamé et al. 2003). The MPA network design criterion of representativity aims to protect examples of the full range of ecosystems and habitat types found within a given planning area (Roberts et al. 2003) and builds ecological resilience to the impacts of human activities into the network by incorporating a proportion of each type of ecosystem and habitat. Achieving representativity in an MPA network requires information on the distribution of ecosystems, species, and habitats across multiple spatial scales (Roff et al. 2003; Last et al. 2010; Harris 2012a).

An ecological classification system that partitions areas into relatively homogeneous spatial units based on a selected set of environmental and/or biological variables can be used to delineate ecological units as a basis for implementing the representativity criterion (Roff and Zacharias 2011). Ideally, marine ecological classifications should reflect the relationships between physical features and the distribution and abundance of species (Gregr et al. 2012), however due to pervasive limitations in availability of biological data, most marine ecological classifications are built on the physical geography or the abiotic conditions of the planning area (also referred to as a physiographic classification), and assume that these physical variables are reliable surrogates of biological patterns (e.g., Roff et al. 2003). While abiotic surrogates can reflect biological patterns (e.g., Roff and Taylor 2000), physiographic classifications do not perform as well as biologically informed classifications at fulfilling the representativity criterion in conservation planning (Lombard et al. 2003; Rodrigues and Brooks 2007; Sutcliffe et al. 2015). Sutcliffe et al. (2015) found that abiotic classifications may be used for an initial reserve design when biological information is insufficient, but classifications that are biologically informed either through weighting the biological importance of abiotic variables (e.g., Pitcher et al. 2012) or by explicitly incorporating biological data, will produce more representative reserves.

On the Pacific coast of Canada, a joint federal-provincial strategy was recently initiated to develop an “ecologically comprehensive, resilient and representative network of marine protected areas” (Canada-British Columbia Marine Protected Area Network Strategy 2014) necessitating a deeper understanding of the spatial distribution of species, ecosystems and habitats in the region. In the late 1990’s, the provincial government of British Columbia (BC) developed a marine ecological classification system, British Columbia Marine Ecological Classification (BCMEC), using a physiographic approach. BCMEC is a hierarchical classification that comprises five nested divisions based on physical properties of the environment from ice regimes at the top of the hierarchy to currents, substrate, relief and exposure at the lowest level (Zacharias et al. 1998, available at

BCMEC is grounded in expert knowledge, however the degree to which it represents patterns of biological diversity at lower levels (Ecosections, Ecounits) remains untested. The 12 BCMEC Ecosections on the BC coast are delineated based on ocean currents and stratification (Zacharias et al. 1998). Many studies use the Ecosections in a description of their study site, or use their boundaries to summarize biological or socio-economic information (e.g., Ban and Vincent 2009; Nelson et al. 2011; Robb 2014). However, without biological validation, their utility to fulfil the ecological representativity criterion for conservation planning is uncertain. The BCMEC Ecounits, nested below the Ecosections, attempt to classify the seabed using available physiographic data including currents, depth, bottom substrate, bottom relief, and wave exposure (AXYS Environmental Consulting Ltd 2000, 2001), but the Ecounits are criticized based on the scale and accuracy of the input substrate layer, the methodology that is difficult to repeat, and the lack of biological validation (e.g., Levings and Jamieson 1999; Johannessen et al. 2004). Given these constraints, the BCMEC Ecounits are not examined further in this study.

In this study, we use biological information to first validate the BCMEC Ecosections and then develop a new ecological classification that integrates biotic and abiotic information. We focus our biological validation of BCMEC on the Ecosection level to determine its utility as a representative layer of benthic biodiversity patterns. Specifically, we use biological data from the shelf and continental slope along British Columbia’s coast (Fig. 1a) to examine if the BCMEC Ecosections (Fig. 1c) represent mesoscale (10–1000 s km) benthic biodiversity patterns. We then propose and evaluate an alternative approach to incorporate biological and environmental data into a community-based benthic ecological classification at a similar scale to the Ecosections for use in MPA network planning in the region. Explicitly recognizing the overarching influence of coarse-scale biodiversity patterns in a hierarchical ecological classification by incorporating biological data into their delineation, ensures that these coarse-scale patterns are captured at the lower levels of the classification hierarchy where more fine-scale data are often lacking (Last et al. 2010).
Fig. 1

a Study area showing the Northern Shelf (NSB, gray) and the Southern Shelf (SSB, dark gray) Bioregions. The study area was gridded into 4 km × 4 km cells (“sites”); any grid cell that intersected with land was not considered. b Distribution of sample sites (orange) for which biological data was available (n = 3615), as well as excluded sites (purple) that only contained records of one species (n = 92). Grey areas within the study area had no biological data. c Subset of British Columbia Marine Ecological Classification (BCMEC) Ecosections analyzed in this paper

To delineate ecological units, we used a community modeling “assemble first, predict later” approach described by Ferrier and Guisan (2006). Using this approach, biological survey data from systematic fisheries-independent groundfish and invertebrate surveys are first classified into assemblages irrespective of environmental data using a cluster analysis. Second, we use a random forest model to identify the environmental correlates of the assemblages’ distributions, and then use the model to predict the presence of assemblages to areas within the study area with no biological data. The predictive performance of random forest models, regularly used in species distribution modelling studies, are typically equivalent to or better than other statistical and machine-learning methods in comparisons relating ecological to covariate data (Prasad et al. 2006; Cutler et al. 2007; Gonzalez-Mirelis and Lindegarth 2012).

Indicator species are commonly used for the analysis of biodiversity change or for defining conservation strategies (Lindenmayer et al. 2000; De Càceres et al. 2010; Hayes et al. 2015). For the final step in our classification approach, we use an indicator species analysis (IndVal; Dufrêne and Legendre 1997), to identify species that are associated with each ecological unit delineated in the classification. The IndVal combines the species’ site fidelity with its relative frequency of occurrence to statistically determine if species are associated with one or several classes. In other words, this analysis identifies species that are well represented by each ecological unit, and therefore enhances the interpretability of the resulting map.


Study area

With approximately 36,000 km of complex shoreline, over 6500 islands and over 450,000 km2 of marine waters, Canada’s Pacific Region is a highly diverse and productive part of the ocean (Fig. 1). The waters on the BC coast are located in a transition zone dominated by Alaska Coastal Current flowing to the north and the California Current flowing to the south. These currents shape recognized zoogeographic provinces in fish fauna where the Aleutian and Oregonian provinces overlap and a transition of algal and invertebrate composition occurs (Allen and Smith 1988; Druehl 2000; Fenberg et al. 2015). Although the location of the boundary between these zoogeographic provinces is spatially and temporally dynamic, in general, a transition zone occurs near Brooks Peninsula on the west coast of Vancouver Island at the dividing point for the two current domains (Lucas et al. 2007). This point is the basis for distinguishing two nationally designated bioregions: the Northern Shelf Bioregion (NSB) and the Southern Shelf Bioregion (SSB; Fig. 1). In this study, our objective is to better understand the distribution of mesoscale (10–1000 s km) benthic habitats and ecosystems within bioregions.

To complete our analyses, we gridded our study area into 4 km cells (resulting in 6875 cells) and aggregated the biological data and resampled environmental data to this resolution. Given the inherent errors with some remotely sensed abiotic data near the coast (Tyberghein et al. 2012), the topographic complexity of the coastline, and the presence of unique local processes (such as freshwater inputs, narrow fjords and local currents) we excluded grid cells that intersected with land, and the Strait of Georgia Bioregion from the analysis. An additional reason to remove these cells was to remove unequal sampling area as cells that intersected with land were less than 4 × 4 km of ocean area. The study area ranges in depth from 2–2900 m (mean = 538 m, median = 190 m) and is shown in Fig. 1.

Biotic data sources

We used presence/absence data collected in groundfish and crab biological surveys conducted by Fisheries and Oceans Canada between 2000–2014 to test the biological relevance of the existing physiographic classification in the study region (BCMEC-Ecosections, Fig. 1c) and to build a community based ecological classification. The same biotic dataset was used in both analyses and included data from two biological survey programs: (1) The standardized trawl and long line groundfish biological surveys which were undertaken annually from 2003 to 2013; and (2) The crab biological surveys, conducted using standardized trawl and traps, including data from the Tanner Crab Research survey (2000–2006) and the Crab research survey (2000–2014). Although these research surveys are conducted for specific taxa (i.e. groundfish and crabs), all species encountered are recorded. Ninety percent of the surveys were conducted between April and September with the remaining 10 % occurring between October and March (see Supplementary Material for detailed methodology). A total of 3707 cells (referred to as “sites”) contained catch records used in our analysis, at depths ranging from 7 to 2250 m (mean = 304 m, median = 160 m).

We removed species with low frequency in the dataset because they add noise to multivariate analyses and provide little information in addition to that obtained from more common species (Gauch 1982; McCune and Grace 2002, see Supplementary Material). To maximize the inclusion of species while also reducing noise and potential biases in the analysis, we chose a conservative exclusion threshold and removed species reported in less than 1 % of sites (≤37 sites). In addition, sites with single species can cause distortions in similarity analyses (Koleff et al. 2003) so sites where only one species was recorded were also removed. Our final biotic dataset included 174 species (96 species of demersal fish and 78 species of benthic invertebrate—See Supplementary Material for list) and 3615 sample sites (Fig. 1b). Survey effort was not consistent across all sites but initial models included “number of surveys” as a measure of survey effort. Results of these initial models showed that survey effort was not an accurate predictor of assemblage and accounted for less than 0.2 % mean decrease in model accuracy (see next section). Therefore survey effort was removed from further analyses.

Analytical methods

Species composition dissimilarity matrix

To determine the similarity of species composition at each site relative to all other sites, we calculated a matrix of pairwise βsim distance values (R package ‘simba’, Jurasinski and Retzer 2012). The βsim distance (also called Simpson distance or dissimilarity; Koleff et al. 2003; Baselga 2010) describes compositional turnover across sites, independent of species richness (Baselga 2010). This metric has been shown to perform well for presence–absence data (Koleff et al. 2003). βsim is defined as:
$$\beta_{sim} = \, 1 \, {-}a/ \, \left( {\hbox{min} \left( {b,c} \right) \, + a} \right)$$
where a is the number of shared species between sites, and b and c are the respective number of species unique to each site. βsim values range from 0 to 1, with 0 indicating no dissimilarity (identical species composition) and 1 indicating full dissimilarity (no shared species; Kreft and Jetz 2010). This pairwise distance matrix was used in the test of the BCMEC Ecosections and in the cluster analysis discussed in the next sections.

Biological validation of BCMEC ecosections

To determine if the BCMEC Ecosections (Fig. 1c) represented benthic biological diversity patterns, we assigned each site to the Ecosection in which its centre point fell. Of the 12 marine Ecosections, five (with sample sizes from 281 to 1372) were included in the analysis: Continental Slope, Dixon Entrance, Hecate Strait, Queen Charlotte Sound, and Vancouver Island Shelf. The remaining seven Ecosections fell outside of or had limited overlap with our study area (≤35 sites each) and were not analyzed further.

We used a permutational analysis of variance (PERMANOVA, Anderson 2001; McArdle and Anderson 2001) to test whether the species composition was significantly different among groups (Ecosections) and a test of the homogeneity of multivariate dispersions among groups (PERMDISP, Anderson 2006) to help interpret the PERMANOVA results. A significant PERMANOVA result can be due to differences in centroid location among groups (i.e., differences in species composition), differences in spread (variance), or a combination of the two (Anderson and Walsh 2013). PERMDISP tests if the average within-group dispersion, measured by the average distance to group centroid, is equal among groups. Balanced PERMANOVA tests are more robust (Anderson and Walsh 2013), so we randomly resampled the number of sites in each Ecosection to the smallest sample size (n = 281, Hecate Strait). Each test was run with 999 permutations.

To aid in the interpretation of the results, we examined the data visually, using nonmetric multidimensional scaling (nMDS). nMDS is an iterative search for a ranking and placement of n entities on k dimensions (axes) that minimizes the stress of the k-dimensional configurations, where stress is a measure of departure from monotonicity in the relationship between the distance in the original matrix and distance in the reduced k-dimensional ordination space (McCune and Grace 2002). The nMDS plot provides a visualization of the differences in species composition among groups. Groups were defined by the Ecosection boundaries, and all sites that fell within the boundary were assigned to that Ecosection. PERMANOVA, PERMDISP, and nMDS were run in R using the adonis, betadisper, and metaMDS functions in the ‘vegan’ package (Oksanen et al. 2014).

To better understand the community structure within each Ecosection we ran an indicator species analysis using the R function IndVal in the “labdsv” package (Roberts 2015). IndVal calculates an indicator value for each species, ranging from 0 to 1, based on the relative frequency of each species in a group compared to all other groups, and can be interpreted as how strongly a specific species is associated with a given group. For presence–absence data, a species’ IndVal in group i is calculated as the product of a species’ specificity (the proportion of sites with that species present for group i, divided by sum of proportions of sites with that species present for all other groups), and the species’ fidelity (the proportion of sites with presences in group i). High IndVal values show that a species is not only very frequent in a particular group, but that it is also infrequent elsewhere. A permutation test also calculates a p value for each indicator value and species. We report indicators within each group that were significant (p < 0.05) and had an IndVal value >0.25 (following Dufrêne and Legendre 1997). To compare the strength of indicator species in the BCMEC Ecosection classification with the community classification approach described in the next section, we compared the top indicator species for each Ecosection with the top indicator for each ecological unit in the community analysis. Although IndVal was designed for abundance data, it performs well for presence–absence data (Podani and Csányi 2010).

Defining biological assemblages for community model

The matrix of pairwise βsim values described in the previous section was used to create a dendrogram using average clustering (“UPGMA”, unweighted pair group method with arithmetic mean). To assess the performance of the βsim distance compared to other (dis)similarity measures, we compared the cophenetic correlation coefficients (also referred to as the cluster validity index; Lessig 1972) for the dendrograms produced using the βsim, Sorensen, Jaccard, and Ochiai distances. Determining the appropriate number of clusters (k) is an enduring issue in cluster analysis (Milligan and Cooper 1985). In biogeographic studies different types of stopping rules have been used, for example, a minimum number of grid cells per cluster (Williams et al. 1999), a predetermined level of dissimilarity (e.g., Proches 2005), or the height of the nodes of dendrogram and various metrics of relative endemism within clusters (e.g., Kreft and Jetz 2010). Given the objective of our study was to delineate the study area into biologically relevant ecological units, we wanted to maximize the number of clusters to ensure all assemblages at this scale were captured, while also maximizing the number of sites classified into geographically coherent clusters. To determine the optimal cut-off we examined three metrics: (1) the proportion of sites in the most-populated clusters; (2) the spatial coherence (clumping) of the sites in each cluster; and (3) the variation in cluster size. After the dendrogram was cut, most sites were assigned to a “major cluster”, with a small number of sites that fell in small, spatially scattered clusters considered “unclassified”.

Cluster analysis and related routines were carried out using R packages ‘stats’,’dendroextras’, and ‘dendextend’ (R Core development Team 2014; Jefferis 2014; Galili 2015).

Random forest analysis

We used a random forest analysis to identify environmental correlates of the variation in biological clusters across space, and evaluate whether these relationships could be used to accurately predict cluster membership in areas with no biological data. Random forest is a machine-learning method that creates an ensemble or “forest” of classification trees. It avoids developing a tree model that is over-fit to the training data by using bootstrap aggregation or “bagging” to repeatedly sample the data with replacement (bootstrapping) and developing trees for each dataset (Cutler et al. 2007). The “out of the bag” sample (1/3 of the data) are held out of the same and used to evaluate the model accuracy using a metric analogous to R2, called pseudo R2 (Franklin 2009).

Abiotic data

Environmental rasters were resampled from their original resolutions to a 4 km × 4 km cell size to match the spatial resolution of the biological data. Although random forest can handle correlated variables (Breiman 2001), for ease of interpretability we selected a subset of the original 59 environmental variables that were not highly correlated (R2 for each pair of variables <0.7; R package ‘corrplot’, Wei 2013), had coverage for the entire study area, and have been shown to be biologically important (reviewed by Harris and Baker 2012; see Supplementary Material for analysis). We retained 12 variables, including depth, rugosity, flow (summer and winter), tidal direction, tidal speed, bottom salinity (range), bottom temperature (range), sea surface temperature (overall), and concentrations of phosphate, dissolved oxygen, and silicate (see Supplementary Material). Of the original 3615 sampling sites used in the cluster analysis, 3496 were assigned to a major dendrogram cluster and had associated data for all 12 predictor variables. Only these 3496 sites were used in the random forest analysis.

Model parameters and performance metrics

The random forest model was implemented in R, using the ‘randomForest’ package (Liaw and Wiener 2002) with default settings and 10,000 trees for each run. The accuracy of the model was assessed as 100−(% out-of-bag error), as well as with tenfold cross validation. For cross-validation, the input data were randomly divided into ten subsamples; each subsample (10 % of full dataset) was used to test the prediction accuracy of a 10,000-tree model built on the remaining data (90 %). Model fits were quantified using the area under the receiver operating characteristic curve (AUC; function auc in R package ‘pROC’, Robin et al. 2011). AUC values typically range from 0.5 for classifiers that perform no better than random to 1.0 for perfect classification (Fawcett 2006). AUC values from each cross-validation run (n = 10) were averaged to assess overall fit of the model. The relative importance of each predictor variable was also obtained from the cross-validation analysis, by taking the average of the mean decrease in model accuracy for each predictor for each 90–10 split. The variable importance plots were examined to assess the importance of each predictor in the classification of each biological cluster individually, and for the overall model.

Data from the 12 predictor environmental variables were available for 6814 of 6875 (99.7 %) grid cells within our study site. The random forest model was projected onto these layers to get a surface of predicted cluster membership for sites that were not used to create the model (i.e., sites not assigned to a major cluster and sites without biological data). The model input data covered just over half of the study site (3480 of 6586 grid cells). Using the modelled relationships between the environmental data and the biological assemblage data, we delineated mesoscale ecological units. Each identified ecological unit refers to the biological assemblage and the dominant environmental characteristics shaping that assemblage as predicted by our model, for the entire study area. We further examined the uncertainty in the model by mapping the percentage of votes underlying each predicted cluster classification (the output of the random forest model). This evaluation provides a visualization of the underlying uncertainty in the predicted surface to identify areas of poorer model fit in the classification.

The species assemblages within ecological units were assessed using the same analysis described for the Ecosections. We determined indicator species using IndVal, and ran balanced PERMANOVA, PERMDISP, and nMDS analyses on the ecological units for comparison with the Ecosection results. PERMANOVA is generally used to test a priori groups such as the Ecosections, therefore its use in testing our units could be considered somewhat circular, given that we defined the clusters using the same data. However, given that the ecological units represent the modelled results (not the clusters themselves), we continued with the analysis for comparative purposes.


Biological validation of ecosections

The PERMANOVA results revealed significant differences in species composition among Ecosections (F = 24.43, df = 4, p < 0.0001). However, the PERMDISP test rejected the null hypothesis of homogeneity of multivariate dispersion among all groups (F = 5.8, df = 4, p < 0.001) indicating that the significant PERMANOVA result could be driven by differences in multivariate spread in the data within groups. The effect size shows that only 32 % of the variation is explained by Ecosections leaving 68 % of the variation explained within groups (See Supplementary Material for PERMANOVA tables).

The nMDS plot to examine the similarity of Ecosections in multidimensional space indicates high overlap between three Ecosections (Dixon Entrance, Queen Charlotte Sound, and Vancouver Island Shelf), with only the Continental Slope Ecosection and to a lesser degree, Hecate Strait, showing any distinction from the others (Fig. 2a). To determine if the continental slope was driving the significant results of the PERMANOVA, we removed it and reran the analysis. With continental slope removed, the PERMANOVA showed a significant result (F = 80.3, df = 3, p < 0.001), but the amount of variation among groups decreased from 32 to 17 %.
Fig. 2

Non-metric multidimensional scaling plot of a BCMEC Ecosections and b Ecological units, with 95 % confidence ellipses

Indicator species analysis on the Ecosections returned 166 species that were significant at p < 0.05. A high IndVal value (ranges from 0 to 1) indicates that a species very frequent in that particular Ecosection and infrequent in other Ecosections indicating the strength of association to a particular habitat found within that Ecosection. An IndVal value cutoff of 0.25 (following Dufrêne and Legendre 1997) left only 15 species associated with three of the five Ecosections (Table 1). No strong indicators were present for the Hecate Strait and the Vancouver Island Ecosections. The Continental Slope Ecosection had the highest IndVal values with Grooved Tanner Crab (Chionoecetes tanneri, 0.47), Giant Grenadier (Albatrossia pectoralis, 0.44) and Pacific Grenadier (Coryphaenoides acrolepis, 0.40) showing strong associations.
Table 1

Indicator species for BCMEC Ecosections produced using survey data on demersal fish and invertebrates. Species listed in order of ascending IndVal metric

Ecosection name

Species name

Common name

Frequency in ecosection (% grid cells inhabited)

IndVal in ecosection

Continental Slope

Chionoecetes tanneri

Crab, Grooved Tanner



Albatrossia pectoralis

Grenadier, Giant



Coryphaenoides acrolepis

Grenadier, Pacific




Anoplopoma fimbria





Sebastolobus alascanus

Thornyhead, Shortspine




Sebastolobus altivelis

Thornyhead, Longspine




Lithodes couesi

Crab, Scarlet King



Vancouver Island Shelf

No IndVal > 0.25




Hecate Strait

No IndVal > 0.25




Dixon Entrance

Ophiodon elongatus




Queen Charlotte Sound

Psettichthys melanostictus

Sole, Pacific Sand



Podothecus accipenserinus

Poacher, Sturgeon




Lepidopsetta bilineata

Sole, Rock




Isopsetta isolepis

Sole, Butter




Parophrys vetulus

Sole, English




Pisaster brevispinus

Sea Star




Microgadus proximus

Tomcod, Pacific



All IndVal values reported here are significant (p < 0.05) using a permutation test. Taxonomic names shown are those used throughout our analyses, as cross-referenced with World Register of Marine Species (WoRMS Editorial Board 2015)

Cluster analysis

Our assessment of the performance of the βsim distance compared to other (dis)similarity measures using the cophenetic correlation showed that βsim performed almost twice as well as the other distances (βsim cophenetic r = 0.672, vs. 0.356, 0.337, and 0.365 for Sorensen, Jaccard, and Ochiai, respectively), supporting its suitability for our dataset. Through examination of the proportion of classified sites, the height of the nodes of the tree (h) and the decrease in variance from moving from k clusters to k + 1 clusters (See Supplementary Material for dendrogram cutoff figures), we cut the tree at βsim = 0.55 to produce five major clusters of similar species composition. Most (96.8 %) sites were assigned to these five clusters (Fig. 3a), which revealed areas of spatially coherent species assemblages (Fig. 3b). The clusters are associated with broad geomorphological features which we have used to name the assemblages: (1) Shelf (n = 1755); (2) Troughs (n = 914); (3) Slope (n = 532); (4) Dogfish Bank (n = 180); and (5) Other Banks (n = 118).
Fig. 3

a Dendrogram cut at βsim height of 0.55, b spatial distribution of sites falling within each cluster c output from random forest model and d highlighted areas of low model fit, showing potential transition zones

Random forest analysis and delineation of ecological unit

The environmental variables included in the random forest model accurately classified each cluster with an out-of bag misclassification rate of 15.96 % (pseudo R2 = 84.04). The predictive power of the model was high, with an AUC for cross-validation of the random forest model of 0.93 ± 0.01. An AUC value above 0.9 is considered a high model performance and indicates that the clusters are well-explained by the environmental variables included in the model.

The variable importance plots (Fig. 4) show that depth, bottom salinity range, and bottom temperature range are the most important environmental parameters among those evaluated for differentiating the clusters. When examining each cluster separately, temperature range appears to be less important for the Other Banks and Trough clusters, than it is for all other clusters. Tidal speed is an important variable for Other Banks and the Slope clusters and phosphate concentration is relatively important at Dogfish Bank.
Fig. 4

Variable importance (mean ± SD decrease in accuracy) of random forest model (ten runs with 70 % of data, 10,000 trees each run) for whole model and for each ecological unit

Using the relationships between the environmental data and the biological assemblage data, ecological units were delineated across the study area (Fig. 3c). Although the overall model AUC was high (0.93) we mapped uncertainty (quantified by the percentage of votes to designated cluster, the measure used in the random forest model) to highlight the underlying uncertainty in the model (Fig. 3d). The uncertainty map indicates that in general, the areas surrounding the boundaries of each ecological unit have a lower percentage of votes in the random forest model than areas in the core of each assemblage. This is particularly true at the southern boundary of Dogfish Bank, around Other Banks, and running along the length of the transition from Shelf to Slope at the shelf break.This suggests that the model does not perform as well in transition zones, where the species composition is changing across the environmental gradient. An additional area with high model uncertainty (lower model performance) is around the southern the tip of Vancouver Island (i.e., Juan de Fuca Strait) suggesting that variables included in the model have low predictive power close to land.

Biological patterns of ecological units

Using a balanced design (randomly resampled to n = 119 per ecological unit), the PERMANOVA results showed there were significant differences in species assemblages, among groups delineated using the random forest approach (F = 218.65, df = 4, p < 0.001). However, the PERMDISP test rejected the null hypothesis of homogeneity of multivariate dispersion among all groups (F = 9.8531, df = 4, p < 0.001) indicating that the significant PERMANOVA result could be driven by the spread in the data within groups. However, the PERMANOVA results show that 60 % of the variation is explained among ecological units and 40 % of the variation explained within ecological units. This is in contrast to the results of the Ecosections PERMANOVA, where the majority of the variation (68 %) was due to variation within Ecosections and only 32 % was due to variation among Ecosections.

We used an nMDS plot to examine the similarity of resultant ecological units in multidimensional space (Fig. 2b). The plot, showing the 95 % ellipses, shows an improvement in the distinctness of ecological units compared to the Ecosections’ nMDS showing considerably less overlap among groups. Most of the overlap that does occur is between physically similar groups such as Dogfish Bank and Other Banks. The Shelf shows overlap with all other ecological units except for the Slope. The Slope, similar to the Continental Slope Ecosection, is the most distinct assemblage with only a small amount of overlap with Troughs.

For the ecological units, the indicator species analysis IndVal returned 160 species that were significant at p < 0.05 (permutational test in IndVal). Following Dufrêne and Legendre (1997), we considered strong indicators to be those with IndVal values ≥0.25. This cut-off left 36 species associated with specific ecological units (Table 2): 13 species were strongly associated with Dogfish Bank, ten with Troughs, seven with Slope, four with Other Banks and two species with Shelf. IndVal values for strong indicator species were significantly higher across the ecological units developed in this study (mean = 0.406 ± 0.133, maximum = 0.722) than across the Ecosections (mean = 0.322 ± 0.0.74, maximum = 0.469; F1,7 = 5.7943, p = 0.02044; see Supplementary Material) and therefore indicate an improvement in the ecological representation in these units than those delineated by Ecosection boundaries.
Table 2

Indicator species for ecological units produced by random forest analysis of species assemblages (96 demersal fish and 78 invertebrates) listed in order of IndVal metric

Unit name

Species name

Common name

Frequency in unit (% grid cells inhabited)

IndVal in unit


Albatrossia pectoralis

Grenadier, Giant



Coryphaenoides acrolepis

Grenadier, Pacific



Chionoecetes tanneri

Crab, Grooved Tanner



Sebastolobus altivelis

Thornyhead, Longspine



Lithodes couesi

Crab, Scarlet King



Anoplopoma fimbria




Coryphaenoides cinereus

Grenadier, Popeye




Sebastes babcocki

Rockfish, Redbanded



Sebastes alutus

Perch, Pacific Ocean



Sebastes aleutianus

Rockfish, Rougheye



Sebastolobus alascanus

Thornyhead, Shortspine



Pandalopsis dispar

Shrimp, Sidestripe



Atheresthes stomias

Flounder, Arrowtooth



Microstomus pacificus

Sole, Dover



Glyptocephalus zachirus

Sole, Rex



Strongylocentrotus fragilis

Sea Urchin, Pink



Sebastes diploproa

Rockfish, Splitnose




Sebastes ruberrimus

Rockfish, Yelloweye



Eopsetta jordani

Sole, Petrale



Dogfish Bank

Psettichthys melanostictus

Sole, Pacific Sand



Podothecus accipenserinus

Poacher, Sturgeon



Metacarcinus magister

Crab, Dungeness



Pisaster brevispinus

Sea Star



Parophrys vetulus

Sole, English



Llepidopsetta bilineata

Sole, Rock



Pycnopodia helianthoides

Sea Star



Isopsetta isolepis

Sole, Butter



Chitonotus pugetensis

Sculpin, Roughback



Microgadus proximus

Tomcod, Pacific



Lumpenus sagitta

Prickleback, Snake



Raja binoculata

Skate, Big



Pleuronichthys decurrens

Sole, Curlfin



Other Banks

Sebastes maliger

Rockfish, Quillback



Hexagrammos decagrammus

Greenling, Kelp



Chlamys rubida

Scallop, Pink



Hydrolagus colliei

Ratfish, Spotted



All IndVal values reported here are significant (p < 0.05) using a permutation test

The Slope ecological unit, similar to the Continental Slope Ecosection, had among the highest IndVal values of any unit with three species with IndVal values of over 0.65 (IndVal = 0.71 for Grooved Tanner Crab, Chionoecetes tanneri; IndVal = 0.71 for Giant Grenadier, Albatrossia pectoralis; and IndVal = 0.71 for Pacific Grenadier, Coryphaenoides acrolepis) providing more support that these three species have a strong association with slope habitats. Dogfish Bank had the highest IndVal values of any ecological units with the highest being 0.72 for the Pacific Sand Sole (Psettichthys melanostictus), whereas Troughs’ highest IndVal value was 0.55 for Redbanded Rockfish (Sebastes babcocki) and 0.55 for Pacific Ocean Perch (Sebastes alutus). Yelloweye Rockfish (Sebastes ruberrimus) was the strongest indicator for Shelf, occurring in 46 % of Shelf sites; however, its IndVal of 0.28 reflects its occurrence in other ecological units (8 % of Trough sites, 20 % of Other Bank sites). In contrast, the Giant Grenadier (A. pectoralis) has a high frequency in the Slope (75 % of sites) and also a very high IndVal value (0.71), indicating that it is rarely found in other ecological units (also observed in 4 % of Trough sites).


Integrated approach improves ecological classification

The results of this study provide a new mesoscale ecological classification that can be used in marine spatial planning in the Pacific Region of Canada. A current initiative to develop an MPA network on the coast of British Columbia requires information about the distribution of biodiversity in the planning region and this study provides an initial step in understanding the coarse-scale benthic community patterns and associated environmental heterogeneity. Studies have shown that building reserves using biological data produces more representative reserves (e.g., Sutcliffe et al. 2015), yet reserves built solely on abiotic surrogates result in more representative reserves than randomly selected sites (Rodrigues and Brooks 2007; Beier et al. 2015). Here we tested the biological relevance of an existing physiographic classification based solely on abiotic data and found that it reflects significant compositional turnover in benthic species in our study area. However, we also showed that an integrated approach, using biotic and abiotic data, better represents distinct benthic species assemblages and their associated habitat. Our analyses showed there are few species with strong associations to the physiographically-based Ecosections, with the exception of the Continental Slope. The nMDS analysis highlighted the high overlap among Ecosections and only the Continental Slope Ecosection displays a visually distinct assemblage. This suggests that if the physiographic classification was used in MPA planning to fulfil the representativity criterion, the continental slope species would be represented in the network, but the remaining Ecosection boundaries are not representative of turnover in benthic species and habitat diversity. However, by focussing on available data for benthic species of fish and invertebrates, we were only able to test aspects of the biological relevance of the Ecosections. Given that the Ecosections were built with information on ocean stratification and mixing, they may better represent pelagic biodiversity patterns. Further work is needed to examine how pelagic diversity is structured in comparison to the Ecosection boundaries. In terms of representing meso-scale patterns of benthic biodiversity, our approach provides an improvement and useful alternative to the Ecosections as a meso-scale benthic habitat layer to fulfil the ecological representativity criterion in MPA network planning. This result strengthens the conclusion that classifying habitats using biological information creates ecologically relevant habitat units that better represent species-environment relationships than classifications built on abiotic variables alone (Hewitt et al. 2004; Eastwood et al. 2006; Rooper and Zimmermann 2007; Shumchenia and King 2010).

Community driven classification

The vast majority of benthic marine organisms are limited by some combination of depth, substrate type, temperature, and salinity but the complexity of the relationships to these variables are less well understood (Roff and Zacharias 2011; Harris 2012b). A review of 57 studies on mapping marine benthic communities found that water depth, followed by substrate type, was the most useful surrogate for delineating benthic communities (Harris and Baker 2012). Our results support this finding, with depth coming out as the strongest driver structuring the biological assemblages across our study area, followed by temperature and salinity range. Interestingly, Harris and Baker (2012) found that water properties including temperature and salinity were not as good surrogates as other seabed characteristics such as acoustic backscatter, grain size and rugosity, likely due to the non-linear and complex responses of species to changes in temperature and salinity (Harris 2012b). Unfortunately, a reliable map of substrate type or acoustic backscatter is not currently available at the scale of this study and the available grain size model did not cover the full extent of our study area. We did include rugosity in the analysis however it did not have high predictive power in our model (Fig. 4), perhaps because it was resampled from 100 m to 4 km to meet our sample resolution. More local analyses following similar methods could be carried out in areas where finer-scale data are available within the study area, as other research has shown that this community modeling approach performs well at the biotope scale (10–100 m; Gonzalez-Mirelis and Lindegarth 2012).

Our results found five coarse-scale habitats in our study area with somewhat distinct biological communities: Shelf, Troughs, Other Banks, Dogfish Bank and the Slope. Interestingly, we found that the group of species found on Dogfish Bank, the largest shallow bank in the region (Clarke and Jamieson 2006), was distinct from other banks in the study area. This result supports the identification of Dogfish Bank as an Ecologically and Biologically Significant Area (EBSA, Clarke and Jamieson 2006). An expert-driven process identified Dogfish Bank as an EBSA because it is the largest, shallowest bank in the region, and an important area of aggregation for marine birds and Dungeness Crab, and rearing habitat for flatfish and invertebrate larvae (Clarke and Jamieson 2006). Our analysis showed that four species of flatfish (Pacific Sand Sole, Rock Sole, English Sole, and Butter Sole) were identified as indicator species for Dogfish Bank based on their high frequency. Although all four flatfish species were also found in other areas, particularly the Rock Sole in Other Banks and Shelf, the higher frequency of flatfish in Dogfish Bank in comparison to other ecological units, provides empirical evidence of its importance as flatfish habitat. Similarly, although Dungeness Crab were found in low frequencies in other units (2 % of sites in Other Banks, and 2 % of sites in Shelf), nearly half of the sites in Dogfish Bank contained Dungeness Crab (49 %), providing empirical evidence that this habitat is important for Dungeness crab aggregations, as outlined in its EBSA designation (Clarke and Jamieson 2006). An added benefit of our community approach is the ability to develop an associated list of indicator species representative of each ecological unit, information that is important to conservation planners and managers.

Use of spatial autocorrelation patterns

Spatial autocorrelation, a pattern in which observations are related to one another by their geographic distance, is common in georeferenced ecological data (Legendre and Legendre 2012). The presence of spatial autocorrelation can create problems in species distribution models (SDM; Lennon 2000; Dormann 2007; Crase et al. 2012) such as the random forest approach taken in this paper. Spatial autocorrelation in model residuals of SDMs violates the assumption of independent and identically distributed errors and can inflate type I errors (Legendre 1993; Kühn 2007), which can lead to the selection of unimportant explanatory variables and poorly estimated parameters in SDMs (Lennon 2000, Dormann 2007). There are several approaches to test for spatial autocorrelation in species distribution model residuals (reviewed by Keitt et al. 2002; Dormann et al. 2007) but our community modeling approach is different than the typical species distribution model making it more complicated to test for spatial autocorrelation. For example, we used geographic cohesiveness as one of several criteria for selecting a dissimilarity cut-off in our cluster analysis. We were looking for areas of similar species composition to map coarse-scale benthic biological communities across geographic space, so spatial autocorrelation was inherent in our design (and could be considered a strength, see Gonzalez-Mirelis and Lindegarth 2012).

In many studies, the distribution model is fitted for the specific purpose of mapping its predictions, which involves using the mean, and the distribution of parameters are not often examined (e.g., Gonzalez-Mirelis and Lindegarth 2012). In this study, the map (Fig. 3c) is the output of interest (as opposed to the explanatory variables and parameter estimates) and because only the variance of effects is largely affected by autocorrelation, spatial autocorrelation is less of a concern. However, because spatial autocorrelation was not addressed in our model, we are unable to explicitly test the effects of the structuring processes of each ecological unit. In other words, although depth, salinity and temperature range are strong predictors of the biological clusters, we are limited in our interpretation regarding the strength of those correlative relationships.

Conservation planning and uncertainty

Models of natural systems, including predictive ecological models like random forests inevitably include some degree of uncertainty. Uncertainty is not problematic per se as long as its effects on model projections are not ignored (Gould et al. 2014). However, many correlative models such as species distribution models are spatially projected without explicitly addressing uncertainty, thereby implying a confidence in model outputs that may be misleading (Beale and Lennon 2012; Wenger et al. 2013; Gould et al. 2014). Tulloch et al. (2013) stated that one of the most pervasive forms of uncertainty in data used to make conservation decisions is error associated with mapping of conservation features. While conservation planners should consider uncertainty associated with ecological data to make informed decisions (e.g., Halpern et al. 2006; Langford et al. 2009) model error is rarely accommodated in the planning process (Tulloch et al. 2013).

To better incorporate uncertainty into the planning process in the Pacific Region, we provided an uncertainty map that clearly highlights areas of lower confidence in model performance. Although our overall model performance metrics were considered high (pseudo R2 = 0.84 and AUC = 0.93), at the boundaries of the ecological units presumably across environmental gradients, the output of the model had less support. Model uncertainty in transition zones around edges of ecological units is expected given the potentially steep environmental gradients and associated community turnover. However these areas would be masked if not highlighted through the examination of the level of support underlying the model prediction. Furthermore, transition zones are important features to consider in conservation planning, often with enhanced diversity (Araujo 2002), and mapping uncertainty allowed us to better identify transition zones.

Model predictions in close proximity to land also show higher uncertainty than surrounding areas, particularly around the southern the tip of Vancouver Island. The environmental complexity including local currents and eddies occuring in this area are not likely adequately captured in our abiotic data resulting in low model performance. This result supports the decision to remove sites near land, and the Strait of Georgia Bioregion, and provides evidence that these areas should be modeled separately and at a local scale with finer-scale data if possible. Underlying uncertainty, particularly at the boundaries of classification units, is not captured in most rule-based classifications, like the BCMEC physiographic classification, and documentation of such uncertainty is not always made available or may only be found in technical metadata. The ability to examine the variability in model performance in a spatial context is a strength of our analytical approach and allows conservation planners and managers to explicitly consider uncertainty in the decision making process.

Additional sources of uncertainty in our results are the limitations of our input data. Although we pooled samples over a decade to average inter-annual variation in species presence, the majority of the biological data used in this study was collected in April through September. Therefore, our results best represent spring and summer patterns in benthic diversity and assume that large changes in species composition do not occur seasonally. Many of the species included in the analysis are low mobility or sessile invertebrates so are not expected to move but other species, such as demersal fish and mobile invertebrates may undertake seasonal movements. For example, certain Sablefish (Anoplopoma fimbria) populations, have been shown to make large seasonal migrations whereas others have been shown to remain resident year round (Maloney and Heifetz 1997; McFarlane and Saunders 2006). An additional limitation in our study is that only non-larval stages of species were included. Further studies examining patterns of pelagic diversity will hopefully be able to better incorporate larval life history stages.


We used spatially explicit data on demersal fish and benthic invertebrate species, to first test the biological validity of a physiographic marine classification in the Pacific Region, BC. Second, we maximized the use of available biological data on benthic species to develop a mesoscale classification delineating ecological units that represent distinct biological assemblages for use in MPA network planning. We provided a biological validation of the BCMEC Ecosections, a physiographic classification that had not been tested against biological data prior to this study. We also showed that the representativity of a classification system can be greatly improved by integrating biotic and abiotic data into a predictive modeling framework. Our community modeling approach deepened our understanding of the spatial distribution of benthic biological communities in our study area and provides a good alternative to the existing physiographic classification for marine conservation planning. This study highlights the importance of maximizing the use of biological data in marine conservation planning process, as well as the utility of multi-species stock assessment surveys for community analyses. Although the data used in this study were not collected for this purpose, studies suggest that it is better to move forward with conservation planning even with data limitations, rather than postponing planning efforts and risk further biodiversity loss (Ban 2009; Ban et al. 2014; Beier et al. 2015). As data become available at finer scales, we can use similar approaches to develop biologically driven classification systems that contribute to building an ecologically representative MPA network.



We are grateful for feedback and discussion from Ed Gregr, Laura Feyrer, Erin McClelland, Greig Oldford, Chris McDougall, Carrie Robb, and Karin Bodtker as well as members of the Canada-British Columbia-First Nations Marine Protected Area Technical Team. The manuscript was greatly improved by two anonymous reviewers. We also would like to thank Kate Rutherford, Leslie Barton, Jason Dunham and others who provided access and answered questions about data sources. Funding for this project was provided by the Canada-British Columbia Marine Protected Area Implementation Team and Fisheries and Oceans Canada’s National Conservation Plan Program and the Strategic Program for Ecosystem Research and Analysis.

Supplementary material

10531_2016_1167_MOESM1_ESM.docx (1.6 mb)
Supplementary material 1 (DOCX 1641 kb)


  1. Araujo MB (2002) Biodiversity hotspots and zones of ecological transition Cons. Biol 16:1662–1663Google Scholar
  2. AXYS Environmental Consulting Ltd (2000) British Columbia Marine Ecological Classification Update – Method Options. Prepared for Land Use Coordination Office, Government of British ColumbiaGoogle Scholar
  3. AXYS Environmental Consulting Ltd. (2001). British Columbia Marine Ecological Classification Update. Ministry of Sustainable Resource Management Decision Support ServicesGoogle Scholar
  4. Airamé S, Dugan JE, Lafferty KD et al (2003) Applying ecological criteria to marine reserve design: a case study from the california channel islands. Ecol Appl 13:S170–S184CrossRefGoogle Scholar
  5. Allen MJ, Smith GB (1988) Atlas and zoogeography of common fishes in the bering sea and northeastern pacific. NOAA Technical Report NMFS 66. National Marine Fisheries Service, NOAAGoogle Scholar
  6. Anderson MJ, Walsh DCI (2013) PERMANOVA, ANOSIM, and the mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecol Monog 83:557–574CrossRefGoogle Scholar
  7. Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26:32–46Google Scholar
  8. Anderson MJ (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62:245–253CrossRefPubMedGoogle Scholar
  9. Ban NC (2009) Minimum data requirements for designing a set of marine protected areas, using commonly available abiotic and biotic datasets. Biodiv Cons 18(7):1829–1845CrossRefGoogle Scholar
  10. Ban NC, Vincent AC (2009) Beyond marine reserves: exploring the approach of selecting areas where fishing is permitted, rather than prohibited. PLoS One 4:e6258. doi:10.1371/journal.pone.0006258 CrossRefPubMedPubMedCentralGoogle Scholar
  11. Ban NC, McDougall C, Beck M et al (2014) Applying empirical estimates of marine protected area effectiveness to assess conservation plans in British Columbia, Canada. Biol Consr 180:134–148. doi:10.1016/j.biocon.2014.09.037 CrossRefGoogle Scholar
  12. Baselga A (2010) Partitioning the turnover and nestedness components of beta diversity. Global Ecol Biogeog 19(1):134–143CrossRefGoogle Scholar
  13. Beale CM, Lennon JJ (2012) Incorporating uncertainty in predictive species distribution modelling. Philos Trans R Soc Lond B 367:247–258CrossRefGoogle Scholar
  14. Beier P, Sutcliffe P, Hjort J et al (2015) A review of selection-based tests of abiotic surrogates for species representation. Conserv Biol 29:668–679. doi:10.1111/cobi.12509 CrossRefPubMedGoogle Scholar
  15. Breiman L (2001) Random forests. Mach L 45:5–32Google Scholar
  16. Canada—British Columbia Marine Protected Area Network Strategy(2014) Available from Accessed 8 June 2015
  17. CBD (2010) Aichi Biodiversity Targets, Strategic Plan for Biodiversity 2011-2020 Convention on Biodiversity, Accessed 4 January 2016
  18. Ceballos G, Ehrlich P, Barnosky A et al (2015) Accelerated modern human-induced species losses: Entering the sixth mass extinction. Sci Adv 1:1–5. doi:10.1126/sciadv.1400253 CrossRefGoogle Scholar
  19. Core Development Team R (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, ViennaGoogle Scholar
  20. Crase B, Liedloff AC, Wintle BA (2012) A new method for dealing with residual spatial autocorrelation in species distribution models. Ecography 35(10):888–897CrossRefGoogle Scholar
  21. Cutler DR, Edwards TC Jr, Beard KH et al (2007) Random forests for classification in ecology. Ecology 88:2783–2792CrossRefPubMedGoogle Scholar
  22. De Càceres M, Legendre P, Moretti M (2010) Improving indicator species analysis by combining groups of sites. Oikos 119:1674–1684CrossRefGoogle Scholar
  23. Dormann CF (2007) Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Global Ecol Biogeog 16(2):129–138CrossRefGoogle Scholar
  24. Dormann CF, McPherson JM, Araújo MB et al (2007) Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30(5):609–628CrossRefGoogle Scholar
  25. Druehl L (2000) Pacific seaweeds. Harbour Publ, Madeira ParkGoogle Scholar
  26. Dufrêne M, Legendre P (1997) Species assemblages and indicator species: the need for a flexible symmetrical approach. Ecol Monog 67(3):345–366Google Scholar
  27. Eastwood P, Souissi S, Rogers S et al (2006) Mapping seabed assemblages using comparative top-down and bottom-up classification approaches. Can J Fish Aquat Sci 63:1536–1548CrossRefGoogle Scholar
  28. Fawcett T (2006) An introduction to ROC analysis. Pattern Recogn Lett 27:861–874CrossRefGoogle Scholar
  29. Fenberg PB, Menge BA, Raimondi PT, Rivadeneira MM (2015) Biogeographic structure of the northeastern Pacific rocky intertidal: the role of upwelling and dispersal to drive patterns. Ecography 38(1):83–95CrossRefGoogle Scholar
  30. Ferrier S, Guisan A (2006) Spatial modelling of biodiversity at the community level. J Appl Ecol 43:393–404. doi:10.1111/j.1365-2664.2006.01149.x CrossRefGoogle Scholar
  31. Franklin J (2009) Mapping species distributions—spatial inference and prediction. Cambridge University Press, New YorkGoogle Scholar
  32. Galili T (2015) dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics. doi:10.1093/bioinformatics/btv428 PubMedPubMedCentralGoogle Scholar
  33. Gauch HG (1982) Multivariate analysis in community ecology. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  34. Gonzalez-Mirelis G, Lindegarth M (2012) Predicting the distribution of out-of-reach biotopes with decision trees in a Swedish marine protected area. Ecol Appl 22(8):2248–2264CrossRefPubMedGoogle Scholar
  35. Gould SF, Beeton NJ, Harris RM et al (2014) A tool for simulating and communicating uncertainty when modelling species distributions under future climates. Ecol Evol 4(24):4798–4811CrossRefPubMedPubMedCentralGoogle Scholar
  36. Gregr EJ, Ahrens AL, Perry IR (2012) Reconciling classifications of ecologically and biologically significant areas in the world’s oceans. Mar Pol 36(3):716–726CrossRefGoogle Scholar
  37. Halpern BS, Regan HM, Possingham HP, McCarthy MA (2006) Accounting for uncertainty in marine reserve design. Ecol Lett 9:2–11CrossRefPubMedGoogle Scholar
  38. Harris PT (2012a) Biogeography, benthic ecology, and habitat classification system. In: Harris PT, Baker EK (eds) Seafloor geomorphology as benthic habitat. Elsevier, San Francisco, pp 61–87CrossRefGoogle Scholar
  39. Harris PT (2012b) Surrogacy. In: Harris PT, Baker EK (eds) Seafloor geomorphology as benthic habitat. Elsevier, San Francisco, pp 93–102CrossRefGoogle Scholar
  40. Harris PT, Baker EK (2012) GeoHab atlas of seafloor geomorphic features and benthic habitats: synthesis and lessons learned. In: Harris PT, Baker EK (eds) Seafloor geomorphology as benthic habitat. Elsevier, San Francisco, pp 871–890CrossRefGoogle Scholar
  41. Hayes KR et al (2015) Identifying indicators and essential variables for marine ecosystems. Ecol Ind 57:409–419. doi:10.1016/j.ecolind.2015.05.006 CrossRefGoogle Scholar
  42. Hewitt JE, Thrush SE, Legendre P, Funnell GA, Ellis J, Morrison M (2004) Mapping of marine soft-sediment communities: integrated sampling for ecological interpretation. Ecol Appl 14:1203–1216. doi:10.1890/03-5177 CrossRefGoogle Scholar
  43. Jefferis G (2014) dendroextras: Extra functions to cut, label and colour dendrogram clusters. R package version 0.2.1.
  44. Johannessen D, Haggarty D, Pringle J (2004) Boundary definition for the central coast integrated management area. Can Sci Advis Sec Res Doc 2004/050Google Scholar
  45. Juffe-Bignoli D, Burgess ND, Bingham H et al (2014) Protected Planet Report 2014. UNEP-WCMC, CambridgeGoogle Scholar
  46. Jurasinski G and contributions from V. Retzer (2012). simba: a Collection of functions for similarity analysis of vegetation data. R package version 0.3-5.
  47. Keitt TH, Bjørnstad ON, Dixon PM, Citron-Pousty S (2002) Accounting for spatial pattern when modeling organism-environment interactions. Ecography 25(5):616–625CrossRefGoogle Scholar
  48. Koleff P, Gaston KJ, Lennon JJ (2003) Measuring beta diversity for presence–absence data. J Anim Ecol 72:367–382CrossRefGoogle Scholar
  49. Kreft H, Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. J Biogeog 37(11):2029–2053CrossRefGoogle Scholar
  50. Kühn I (2007) Incorporating spatial autocorrelation may invert observed patterns. Divers Distrib 13(1):66–69Google Scholar
  51. Langford WT, Gordon A, Bastin L (2009) When do conservation planning methods deliver? Quantifying the consequences of uncertainty. Ecol Inform 4:123–135CrossRefGoogle Scholar
  52. Last PR, Lyne VD, Williams A, Davies CR, Butler AJ, Yearsley GK (2010) A hierarchical framework for classifying seabed biodiversity with application to planning and managing Australia’s marine biological resources. Biol Cons 143(7):1675–1686CrossRefGoogle Scholar
  53. Legendre P (1993) Spatial autocorrelation: trouble or new paradigm? Ecology 74(6):1659–1673CrossRefGoogle Scholar
  54. Legendre P, Legendre L (2012) Numerical ecology, 3rd ed. Developments in environmental modelling, vol 24. Elsevier, AmsterdamGoogle Scholar
  55. Lennon JJ (2000) Red-shifts and red herrings in geographical ecology. Ecography 23(1):101–113CrossRefGoogle Scholar
  56. Lessig V (1972) Comparing cluster analyses with cophenetic correlation. J Mark Res 9:82–84CrossRefGoogle Scholar
  57. Levings CD, Jamieson GS (1999) Evaluation of ecological criteria for selecting MPAs in pacific region: a proposed semi-quantitative approach. Can Stock Assess Sec Res Doc. 99/210Google Scholar
  58. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2(3):18–22Google Scholar
  59. Lindenmayer DB, Margules CR, Botkin DB (2000) Indicators of biodiversity for ecologically sustainable forest management. Conserv Biol 14:941–950CrossRefGoogle Scholar
  60. Lombard AT, Cowling RM, Pressey RL, Rebelo AG (2003) Effectiveness of land classes as surrogates for species in conservation planning for the cape floristic region. Biol Cons 112(1–2):45–62CrossRefGoogle Scholar
  61. Lucas BG, Verrin S, Brown R (2007) Ecosystem overview: Pacific North Coast Integrated Management Area (PNCIMA). Can Tech Rep Fish Aquat Sci 2667:xiii + 104pGoogle Scholar
  62. Maloney N Heifetz J 1997 Movements of tagged sablefish, Anoplopoma fimbria, released in the eastern Gulf of AlaskaNOAA Technical Report, NMFS130115121Google Scholar
  63. McArdle BH, Anderson MJ (2001) Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology 82:290–297CrossRefGoogle Scholar
  64. McCune B, Grace J (2002) Analysis of ecological communities. MjM Software Design, Gleneden BeachGoogle Scholar
  65. McFarlane G Saunders M 2006 Dispersion of juvenile sablefish, Anoplopoma fimbria, as indicating by tagging in Canadian watersNOAA Technical Report, NMFS130137150Google Scholar
  66. Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50(2):159–179CrossRefGoogle Scholar
  67. Nelson TA, Gillanders SN, Harper J, Morris M (2011) Nearshore Aquatic Habitat Monitoring: a seabed imaging and mapping approach. J Coast Res 272:348–355. doi:10.2112/jcoastres-d-10-00110.1 CrossRefGoogle Scholar
  68. Oksanen J, Guillaume Blanchet F, Kindt R et al (2014) vegan: community ecology package. R package version 2.3-0.
  69. Pimm SL et al (2014) The biodiversity of species and their rates of extinction, distribution, and protection. Science 344(6187):1246752. doi:10.1126/science.1246752 CrossRefPubMedGoogle Scholar
  70. Pitcher CR, Lawton P, Ellis N et al (2012) Exploring the role of environmental variables in shaping patterns of seabed biodiversity composition in regional-scale ecosystems. J Appl Ecol 49(3):670–679CrossRefGoogle Scholar
  71. Podani J, Csányi B (2010) Detecting indicator species: some extensions of the IndVal measure. Ecol Ind 10(6):1119–1124CrossRefGoogle Scholar
  72. Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199CrossRefGoogle Scholar
  73. Proches S (2005) The world’s biogeographical regions: cluster analyses based on bat distributions. J. Biogeog. 32:607–614CrossRefGoogle Scholar
  74. R Core development TEAM 2014 R: a language and environment for statistical computing R foundation for statistical computing ViennaGoogle Scholar
  75. Robb CK (2014) Assessing the impact of human activities on British Columbia’s estuaries. PLoS One 9:e99578. doi:10.1371/journal.pone.0099578 CrossRefPubMedPubMedCentralGoogle Scholar
  76. Roberts CM, Branch G, Bustamante RH et al (2003) Application of ecological criteria in selecting marine reserves and developing reserve networks. Ecol Appl 13:S215–S228CrossRefGoogle Scholar
  77. Roberts DW (2015) labdsv: ordination and multivariate analysis for ecology. R package version 1.7-0.
  78. Robin X, Turck N, Hainard A et al (2011) pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinf 12:77. doi:10.1186/1471-2105-12-77 CrossRefGoogle Scholar
  79. Rodrigues ASL, Brooks TM (2007) Shortcuts for biodiversity conservation planning: the effectiveness of surrogates. Annu Rev Ecol Evol Syst 38:713–737. doi:10.1146/annurev.ecolsys.38.091206.095737 CrossRefGoogle Scholar
  80. Roff JC, Taylor ME (2000) National frameworks for marine conservation—a hierarchical geophysical approach. Aquat Cons Mar Fresh Ecosys 10:209–223CrossRefGoogle Scholar
  81. Roff JC, Zacharias MA (2011) Marine conservation ecology. Earthscan, London, UKGoogle Scholar
  82. Roff JC, Taylor ME, Laughren J (2003) Geophysical approaches to the classification, delineation and monitoring of marine habitats and their communities. Aquat Cons Mar Fresh Ecosys 13(1):77–90CrossRefGoogle Scholar
  83. Rooper C, Zimmermann M (2007) A bottom-up methodology for integrating underwater video and acoustic mapping for seafloor substrate classification. Cont Shelf Res 27:947–957CrossRefGoogle Scholar
  84. Shumchenia EJ, King JW (2010) Comparison of methods for integrating biological and physical data for marine habitat mapping and classification. Cont Shelf Res 30:1717–1729. doi:10.1016/j.csr.2010.07.007 CrossRefGoogle Scholar
  85. Sutcliffe PR, Klein CJ, Pitcher CR, Possingham HP (2015) The effectiveness of marine reserve systems constructed using different surrogates of biodiversity. Consr Biol 29(3):657–667CrossRefGoogle Scholar
  86. Tulloch VJ, Possingham HP, Jupiter SD et al (2013) Incorporating uncertainty associated with habitat data in marine reserve design. Biol Cons 162:41–51. doi:10.1016/j.biocon.2013.03.003 CrossRefGoogle Scholar
  87. Tyberghein L, Verbruggen H, Pauly K et al (2012) Bio-ORACLE: a global environmental dataset for marine species distribution modeling. Global Ecol Biogeog. Available from Supporting information available at
  88. Wei T (2013) corrplot: Visualization of a correlation matrix. R package version 0.73.
  89. Wenger SJ, Som NA, Dauwalter DC et al (2013) Probabilistic accounting of uncertainty in forecasts of species distributions under climate change. Glob Chang Biol 19(11):3343–3354PubMedGoogle Scholar
  90. Williams PH, de Klerk HM, Crowe TM (1999) Interpreting biogeographical boundaries among Afrotropical birds: spatial patterns in richness gradients and species replacement. J Biogeog 26:459–474CrossRefGoogle Scholar
  91. Worm B, Barbier EB, Beaumont N (2006) Impacts of biodiversity loss on ocean ecosystem services. Science 314:787–790. doi:10.1126/science.1132294 CrossRefPubMedGoogle Scholar
  92. WoRMS Editorial Board (2015) World register of marine specie. Available from at VLIZ. Accessed 15 May 2015
  93. Zacharias MA, Howes DE, Harper JR, Wainwright P (1998) The British Columbia marine ecosystem classification: rationale, development, and verification. Coast Manage 26(2):105–124CrossRefGoogle Scholar

Copyright information

© Her Majesty the Queen in Right of Canada 2016

Authors and Affiliations

  • Emily M Rubidge
    • 1
    • 2
  • Katie S. P. Gale
    • 1
    • 2
  • Janelle M. R. Curtis
    • 2
  1. 1.Institute of Ocean SciencesFisheries and Oceans CanadaSidneyCanada
  2. 2.Pacific Biological StationFisheries and Oceans CanadaNanaimoCanada

Personalised recommendations