Collection site, materials, and identification
Churchill is situated at the confluence of the Churchill River and Hudson Bay [18]. This region is characterized by a long, harsh winter and short flight season for aquatic insects [6]. The area represents an ecotone between northern boreal forest and subarctic tundra habitats.
Mayfly, stonefly, and caddisfly specimens were collected in the Churchill region from 2002 to 2007. Average daily air temperatures at Churchill in June and September are typically lower than 7°C [19], conditions that are unsuitable for flight by most caddisflies [20]. As a result, most collecting activities for caddisflies occurred from late June to late August of 2006 and 2007. Mayfly samples were intensively collected during the last two weeks of July in 2007, while stoneflies were contributed by researchers conducting various bio-surveillance projects in this area from mid-June to mid-August of 2006 and 2007. A wide range of lentic and lotic habitats were sampled, including the Churchill River, tundra ponds, lakes, small streams, and pools on rock bluffs near the margin of Hudson Bay.
Adult samples were collected using UV light traps, aerial nets, Malaise traps, and pit-fall traps. Larval samples were collected using a kicknet and by handpicking. Adult specimens were pinned or preserved in 95% ethanol while all larval samples were kept in 95% ethanol. EPT specimens are deposited in the Biodiversity Institute of Ontario, University of Guelph, at the University of Manitoba, and in the University of Minnesota Insect Collection.
Sequencing of COI barcodes for most EPT samples collected during 2002-2006 was conducted before taxonomic experts became involved so all individuals in the collections, including dominant species, were sequenced. Therefore, DNA barcodes should largely reflect the relative abundances of species in the obtained samples. Specimens were subsequently sorted into groups based on their COI clustering patterns. Morphological identification was carried out independently for each of the barcode cluster series after DNA analysis. Additional EPT specimens collected in 2007 were identified before DNA analysis and were combined with the library.
DNA analysis and sequence analysis
COI sequences were generated at the Canadian Centre for DNA Barcoding, University of Guelph. Standard barcoding protocols were followed [21]. Typically, a single leg was used for the extraction of genomic DNA using an AcroPrep™ 96 1 ml filter plate (PALL) with 3.0 μm Glass fiber. DNA was eluted in 40 μl of dH2O. Full-length COI barcodes (658 bps) were amplified using two primer sets: LepF1 (5'-ATTCAACCAATCATAAAGATATTGG-3')/LepR1 (5'-TAAACTTCTGGATGTCCAAAAAATCA-3') [22] and LCO1490 (5'-GGTCAACAAATCATAAAGATATTGG-3')/HCO2198 (5'-TAAACTTCAGGGTGACCAAAAAATCA-3') [23]. MLepF1 and MLepR1 primers [16] were employed when full-length PCR amplification was not successful. A new reverse mini-primer, tagged with a M13 tail, MEPTR1-t1 (5'-CAGGAAACAGCTATGACGGTGGRTATACIGTTCAICC-3') was paired with LCO1490-t1 to recover the first 325 bps of the 5' terminus of barcode region. This primer set proved to be effective in EPTs, particularly mayflies.
Each PCR reaction had a total volume of 12.5 μl and contained 5% trehalose (D-(+)-Trehalose dehydrate), 1.25 μl of 10× reaction buffer, 2.5 mM of MgCl2, 1.25 pmol each of forward and reverse primer, 50 μM of dNTP (Promega), 0.3 U of Platinum Taq DNA polymerase (Invitrogen), and 2 μl of genomic DNA. PCR products were visualized on a 2% agarose E-gel® 96-well system (Invitrogen). Amplification products were sequenced bi-directionally using BigDye v3.1 and analyzed on an ABI 3730xl DNA Analyzer (Applied Biosystems) as described in deWaard et al. [24] and Hajibabaei et al. [25].
COI sequences were aligned in MEGA 4.0 [26] using the integrated ClustalX method with default parameters. The amino acid translation was examined to ensure that no gaps or stop codons were present in the alignment. Unique haplotypes for each species were recognized using analytical tools available at the "DNA Barcoding Tools" website http://www.ibarcode.org[27]. These haplotypes were then imported into MEGA for tree construction using the Neighbour-Joining method with pair-wise deletion of missing sites and Kimura-2-Parameter (K2P) distance [28] options. A Newick format tree was exported from MEGA and was annotated using an online tool for phylogenetic tree display--Interactive Tree of Life [29]. Genetic distances were obtained using sequence analytic tools ("Nearest Neighbour Summary") provided in BOLD using K2P distances for all sequences longer than 350 bps.
Testing barcode cluster delineation and the morphological species concept
The morphological identifications employed in this work are based on current nomenclature for each taxonomic group. All valid species are morphologically distinguishable from others and possess consistent diagnostic character sets, even though barcode sequences may show distinctive groups within such morphological species. To aid the discussion, we refer to the units recognized through morphological study as 'morphospecies' throughout this paper.
To test how patterns of genetic divergence at COI correspond to morphological species concepts, we estimated the Churchill EPT species diversity based on the similarity and clustering pattern in their COI barcodes independent of taxonomic assignments. We employed an arbitrary threshold of 2% sequence divergence to draw boundaries for barcode haplotype clusters. This arbitrary threshold is selected due to the fact that intraspecific divergences observed in a variety of groups rarely exceed this value [see [3, 22], and [30]]. Although exceptions have been observed in some taxa [e.g., [31, 32]], we emphasize that the species definition used in this work is not based on any genetic threshold, but on concordant evidence from morphology and barcode similarity. We seek only to determine if such a simple delimitation of mitochondrial COI haplogroups for Churchill's EPTs could be informative in evaluating the trend and completeness of general biological sampling, even if taxonomic expertise were not available.
Taxon accumulation curves were constructed to assess the degree of completeness of this survey and to compare the results that would be obtained with and without access to taxonomic expertise. Randomized accumulation curves were built based on morphospecies determined by taxonomists (XZ, LMJ, and RED) and on barcode clusters as delimited using a 2% threshold, using EstimateS V.8.0 [33] with 50 randomization replicates and default settings.
Additionally, the correspondence between these two measures and the total phylogenetic diversity was explored. DNA sequences were formatted for the program R version 2.8.1 [34] and analyzed using the packages APE [35] and CAIC [36]. A Neighbour-Joining tree based upon K2P distances and pair-wise deletion was reconstructed. For each tip number, ranging from 1 up to the total sample size of individuals, tips were randomly sampled 1,000 times. At each replicate, total phylogenetic diversity was calculated and then averaged across randomizations for each tip number. A detailed protocol along with all commands used is provided in Additional file 4. Resulting phylogenetic diversity values were multiplied by a scaling factor to allow their presentation on the same scale as the species accumulation curves, aiding comparison of their shapes.