Marine Biology

, 156:2641

DNA barcoding of Pacific Canada’s fishes


    • Canadian Centre for DNA Barcoding, Biodiversity Institute of OntarioUniversity of Guelph
  • Tyler S. Zemlak
    • Canadian Centre for DNA Barcoding, Biodiversity Institute of OntarioUniversity of Guelph
    • Department of BiologyDalhousie University
  • James A. Boutillier
    • Pacific Biological StationFisheries and Oceans Canada
  • Paul D. N. Hebert
    • Canadian Centre for DNA Barcoding, Biodiversity Institute of OntarioUniversity of Guelph
Comment and Reply

DOI: 10.1007/s00227-009-1284-0

Cite this article as:
Steinke, D., Zemlak, T.S., Boutillier, J.A. et al. Mar Biol (2009) 156: 2641. doi:10.1007/s00227-009-1284-0


DNA barcoding—sequencing a standard region of the mitochondrial cytochrome c oxidase 1 gene (COI)—promises a rapid, accurate means of identifying animals to a species level. This study establishes that sequence variability in the barcode region permits discrimination of 98% of 201 fish species from the Canadian Pacific. The average sequence variation within species was 0.25%, while the average distance separating species within genera was 3.75%. The latter value was considerably lower than values reported in other studies, reflecting the dominance of the Canadian fauna by members of the young and highly diverse genus Sebastes. Although most sebastids possessed distinctive COI sequences, four species did not. As a partial offset to these cases, the barcode records indicated the presence of a new, broadly distributed species of Paraliparis and the possibility that Paraliparis pectoralis is actually a species pair. The present study shows that most fish species in Pacific Canadian waters correspond to a single, tightly cohesive array of barcode sequences that are distinct from those of any other species, but also highlights some taxonomic issues that need further investigation.


The limitations inherent in morphology-based identification systems and the dwindling pool of taxonomists signal the need for a new approach to species recognition. DNA barcoding seeks to advance both species identification and discovery through the analysis of patterns of sequence divergence in a standardized gene region. Many studies have now shown the effectiveness of a 650-bp fragment of the cytochrome c oxidase I (COI) gene for species identification in varied animal lineages (Hebert et al. 2003a, b; Barrett and Hebert 2004; Hebert et al. 2004a, b; Hogg and Hebert 2004; Smith et al. 2005; Hajibabaei et al. 2006), including fishes (Ward et al. 2005; Hubert et al. 2008). Ward et al. (2005) provided early evidence for the efficacy of DNA barcoding in marine fish identification in a study that examined more than 200 Australian species. A subsequent study focusing on the molecular evolutionary behavior of COI in fishes implies that DNA barcoding should be extensible to all marine fishes (Ward and Holmes 2007). Other studies report on the utility of barcoding to test species boundaries and to highlight potentially overlooked species (Smith et al. 2008b; Ward et al. 2008a, b).

A comprehensive database of COI sequences, linked to authoritatively identified voucher specimens for all fishes, promises a significant advance for fisheries science (Ward et al. 2009). Aside from enabling identifications for whole specimens, barcode analysis opens up new possibilities—it can provide identifications during any stage of development (Steinke et al. 2005; Pegg et al. 2006) and identify fragmentary or processed remains (Smith et al. 2008a; Wong and Hanner 2008). Combined with the potential for automated, rapid sample processing (Garland and Zimmer 2002), DNA barcoding could soon provide a powerful foundation for accurate and unambiguous identification of fish and fish products from eggs to adults, allowing the surveillance of species substitutions in the marketplace, assisting in sustainable fisheries management, and improving ecosystem research and conservation. However, these applications require the construction of a barcode reference library with comprehensive taxonomic and geographic coverage, a task that will necessitate the analysis of one faunal region after another.

Hart (1973) recognized 325 marine fish species in coastal and offshore Canadian waters of the North Pacific, but the total could be as high as 420 (Froese and Pauly 2006). This fauna includes species in just over 200 different families, most with only a single or few species, but flatfishes, eelpouts, salmonids, snailfishes, and rockfishes are more diverse. Rockfishes (Sebastidae) are the most diverse family with 36 species in two genera (Sebastes and Sebastolobus). Members of this family are challenging to identify as subtle differences in spine orientation and pigmentation are often the only characteristics that distinguish closely related species. Further complications include morphological variation within species (e.g., color morphs) and sibling species that lack diagnostic morphological differences but show clear genetic divergence (Gharrett et al. 2005). Species-rich groups like rockfishes present a significant challenge for DNA barcoding because some species are thought to have diversified very recently. In fact, all 100 species in the genus Sebastes are thought to have diversified in the last 8–9 mya (Hyde and Vetter 2007). Perhaps because of the young age of some species, reproductive isolation is not complete and introgressive hybridization has been reported (Roques et al. 2001). The prevalence of this group in Canadian Pacific waters affords an excellent opportunity for testing the limits of the DNA barcoding system.

This study examines the patterns of sequence divergence at COI in 201 fish species from Canadian Pacific waters, representing half of the known fauna. The investigation not only provides a further test of COI barcodes for fish identifications, but also explores the application of DNA barcodes to flag overlooked species and discusses the potential limitations to the system.

Materials and methods

Taxonomic coverage

This study examined 1,225 individuals representing 201 fish species from Canadian Pacific waters. When possible, at least five adults were analyzed per species. All specimens are stored as vouchers in the Royal British Columbia Museum, Victoria, Canada. Collection details are recorded in the public project file “Fishes of Pacific Canada Part I” on, while Table S1 provides a list of species sorted by the taxonomic hierarchy in Nelson (1994). Samples were collected from 2004 to 2007 at both shallow (400 m) and deepwater (2,000 m) sites around Vancouver Island and the Queen Charlotte Islands (Fig. 1).
Fig. 1

Collection sites for specimens examined in this study. Further details on the collection location for each specimen including GPS coordinates and depth information are provided in the ‘Fishes of Pacific Canada’ in the Published Projects section of the Barcode of Life Data System (BOLD,

DNA analysis

DNA was extracted from the muscle tissue of each specimen using an automated glass fiber protocol (Ivanova et al. 2006). The 650 bp barcode region of COI was subsequently amplified under the following thermal conditions: 2 min at 95°C; 35 cycles of 0.5 min at 94°C, 0.5 min at 52°C, and 1 min at 72°C; 10 min at 72°C; held at 4°C. The 12.5 μl PCR mixes included 6.25 μl of 10% trehalose, 2.00 μl of ultrapure water, 1.25 μl 10× PCR buffer [200 mM Tris–HCl (pH 8.4), 500 mM KCl], 0.625 μl MgCl2 (50 mM), 0.125 μl of each primer cocktail (0.01 mM, using primer cocktails C_FishF1t1 and C_FishR1t1 from Ivanova et al. 2007), 0.062 μl of each dNTP (10 mM), 0.060 μl of Platinum® Taq Polymerase (Invitrogen), and 2.0 μl of DNA template. PCR amplicons were visualized on a 1.2% agarose gel E-Gel® (Invitrogen) and bidirectionally sequenced using sequencing primers M13F or M13R (Ivanova et al. 2007) and the BigDye® Terminator v.3.1 Cycle Sequencing Kit (Applied Biosystems, Inc.) on an ABI 3730 capillary sequencer following manufacturer’s instructions.

Sequence data were submitted to the Barcode of Life Data system [(BOLD,, see (Ratnasingham and Hebert 2007)] and to GenBank (Accession numbers in Table S1). Specimen and collection data, sequences, specimen images, and trace files are provided in the project ‘Fishes of Pacific Canada Part I’ in BOLD. A Kimura 2-parameter (K2P) distance metric was employed for sequence comparisons (Kimura 1980), genetic distances and initial Neighbor-joining (NJ) clustering used the BOLD Management and Analysis System. Further analyses examined COI sequences from a larger number of species of Sebastes by supplementing results from the present study (27 species) with those of Hyde and Vetter (2007, 92 species) and a publicly available barcoding project ‘Fishes of Argentina’ (2 species, see BOLD). All additional NJ analyses were executed with MEGA version 3.1 (Kumar et al. 2004) using 1,000 replicates of bootstrap support.


COI amplicons were recovered from all 1,225 individuals and no indels or stop codons were encountered. Sequence length averaged 646 bp (range 490–652 bp), and fewer than 2% of the records were below 600 bp. Overall nucleotide frequencies were C (28.94%), T (29.31%), A (23.30%), G (18.45%).

A NJ tree of COI sequence divergences (K2P) indicated that most species formed cohesive units (Fig. S1). Mean K2P sequence distance between congeneric species (3.75%) was approximately 15-fold higher than that within species (0.25%). The clear division between intra- and interspecific sequence variation is further illustrated in the half-logarithmic dot plot displayed in Fig. 2, which contrasts genetic distances within each species versus distance to its nearest genetic neighbor.
Fig. 2

Half-logarithmic dot plot of genetic distances within each species against genetic distances to nearest neighbor. For each species, there is a black dot showing intraspecific K2P distance and a gray dot directly above or below it, which shows distance to nearest neighbor. Sorting by intra- and interspecific distance allows the relative distances for each species to be seen. This graph indicates that few species have nearest-neighbor distances that are less than the mean intraspecific distance for that species. A line drawn at 1% separates most intraspecific from interspecific values

The NJ tree revealed unusually shallow genetic distances between congeneric species of the genera Sebastes and Sebastolobus. Members of these genera also constituted the lowest levels of nearest-neighbor distances in Fig. 2. Re-analysis after excision of these two genera increased the average congeneric distance to 6.68%, a value much closer to those reported in other studies on marine fishes.

The analysis of COI sequences from 94 species of Sebastes and 2 species of Sebastolobus (Table S2) revealed a mean K2P sequence divergence of 3.4% between congeners and 0.18% within species. Despite the low divergence values between congeneric species, most sebastid species appeared to possess diagnostic COI sequences although this result is, in most cases, conditional on very small sample sizes (Fig. 3). Furthermore, 13 species showed little or no divergence including four from the Canadian Pacific: Sebastes crameri, S. reedi, S. wilsoni, and S. zacentrus.
Fig. 3

A neighbor-joining tree of COI sequence divergences (K2P) in 94 species of Sebastes and 2 species of Sebastolobus. Numbers at nodes represent bootstrap values (only values greater than 80 are shown). The number of specimens follows each species name. Species in gray share barcodes. Species with specimens obtained for this study are indicated with an asterisk

By contrast, we detected one species (Paraliparis pectoralis) with two distinct sequence clusters showing more than 2.3% divergence (Fig. 4), suggesting cryptic species. The split in Careproctus cypselurus (Fig. 4) lies well below the above threshold and may indicate population substructure, but more samples are required to test this conclusion.
Fig. 4

A neighbor-joining tree of COI sequence divergences (K2P) in 19 species of the Liparidae. Numbers at nodes represent bootstrap values (only values greater than 80 are shown). The number of specimens follows each species name. Specimen images are shown in cases where a presently recognized species may actually be a species pair


DNA barcoding delivers species-level identifications when taxa possess unique COI sequence clusters. This condition was met for more than 98% of the species in this study; the sole cases of compromised resolution involved four species of Sebastes. Twelve individuals of S. reedi formed a monophyletic cluster, but it was embedded within two clusters of S. crameri, rendering the latter species paraphyletic. Despite the latter fact, these two species may be identifiable through barcodes, but more specimens need to be analyzed to verify this conclusion. The other species pair, S. wilsoni and S. zacentrus, also show close sequence congruence, but not identity. More specimens need to be analyzed to ascertain whether slight interspecific differentiation (involving just one to three nucleotides) separate these taxa as has been shown in certain butterflies (Burns et al. 2007). To fully validate the effectiveness of barcoding in Sebastes identification, much larger sample sizes of each species need to be examined. This work needs to extend beyond just these four problematic species because nine other species have just a single COI sequence record, and it is closely similar to that of another taxon. The genus Sebastes likely presents a challenging case for the barcoding system because of the recent diversification of the group [starting approximately 8–9 mya; see Hyde and Vetter (2007)] and the occurrence of introgressive hybridization between close relatives (Roques et al. 2001). The few problematic instances revealed in the present study will require further investigation using fast evolving nuclear markers such as the recombination activating gene 2 (RAG2) and internal transcribed spacer 1 (ITS1) to reliably distinguish between species, a feasible prospect considering recent genomic developments in this group (Kai et al. 2002a, b; Narum et al. 2004). However, even in problematic groups such as the Sebastes, supplemental nuclear loci will only be needed for a minority of cases since the resolution offered by the DNA barcoding still offered 84% accuracy to species level.

The mean sequence distance between congeneric fish species (3.75%) was considerably lower in the Canadian Pacific than in Australian marine fishes (9.93%, Ward et al. 2005) or Canadian freshwater species (8.30%, Hubert et al. 2008). However, the excision of the Sebastes and Sebastolobus increased the average congeneric distance to 6.68%. Cases of deep genetic divergence within single species often indicate overlooked cryptic species (Moritz 1994; Meyer and Paulay 2005). DNA barcoding is an effective first approach to detecting such cases by flagging provisional species based on a threshold approach that highlight unusually high intraspecific variation. Paraliparis pectoralis (Fig. 4) qualifies as one such example in the current dataset as it displayed two cohesive clusters separated by a relatively deep average intraspecific divergence, approximately tenfold higher than the average intraspecific divergence (0.25%). Animals of both groups were caught at the same depth range from 1,400 to 1,600 m (detailed depth information are provided in the project ‘Fishes of Pacific Canada’ in the Published Projects section of BOLD) and look very similar (Fig. 4). Although further investigation is needed, this split likely represents a case of overlooked diversity since deepwater liparids are much less studied than their coastal relatives. Their habitat [found as deep as 7,500 m below sea level (Andriashev and Pitruk 1993)] makes them hard to sample and their delicate bodies are often damaged by standard fishing methods, making it difficult to obtain individuals for detailed study. Many species of liparids are only known from single specimen and the possibility of taxonomic uncertainty is high. A further attribute of the barcoding framework is that we were able to quickly compare the unique clusters from the Canadian Pacific to records for all other barcoded liparids (45 species), using the BOLD identification engine. Surprisingly, one of the clusters formed a close match with an Atlantic liparid, showing the power of DNA barcoding not only to detect overlooked species, but to quickly generate important insights regarding the taxonomy and diversification of this group.

The difficulty of obtaining specimens for morphological study is not unique to liparids, but applies to deepwater fishes in general. One useful application of barcoding lies in the use of divergence values to develop a preliminary perspective on taxonomic diversity, allowing work to focus on exceptional cases. For example, five specimens collected during this study were assigned to Paraliparis using morphological attributes (Fig. 4), but species-level diagnosis was impossible. Using the DNA barcoding library, we compared this unknown Paraliparis to the 45 species in the family Liparidae that have barcode records. Interestingly, the five Canadian specimens clustered tightly with an apparently undescribed species of Paraliparis from New Zealand and Antarctica. So, although their species identity awaits determination, the taxon is clearly widespread.

In summary, this study has established that most Pacific Canadian fish species possess a single, tightly cohesive array of barcode sequences distinct from that of any other species. Patterns of COI divergence within our study generally revealed high correspondence with species units recognized through prior morphological analyses, but revealed some difficulties with members of the genus Sebastes that require further investigation, potentially involving supplemental nuclear loci to fully resolve species. The study also highlights the likelihood of an overlooked deepwater liparid species in Canadian Pacific waters, illustrating the utility of DNA barcoding in the revelation of overlooked diversity.


This study was supported by the Canadian Barcode of Life Network through funding from NSERC and Genome Canada through the Ontario Genomics Institute. We thank the Canadian Department of Fisheries and Oceans and the Canadian Coast Guard for ship time and other support during the cruises where specimens were collected. We also thank Ken Fong, Graham Gillespie, Gavin Hanke, Katy Hind, John Klymko, and Dennis Rutherford for help with specimen collections and Mark Stoeckle for sharing his idea of the half-logarithmic do plots for genetic distances.

Supplementary material

227_2009_1284_MOESM1_ESM.pdf (855 kb)
(PDF 854 kb) A neighbour-joining tree of COI sequence divergences (K2P) in all 1,225 individuals of this study. Species names, BOLD process ID, Sample ID, sequence length, and numbers of ambiguous bases are given at branch tips
227_2009_1284_MOESM2_ESM.pdf (457 kb)
(PDF 456 kb)

Copyright information

© Springer-Verlag 2009