Introduction

Copepods (Copepoda) are a group of small crustaceans found in various aquatic environments and they are described as the most abundant metazoans on earth (Humes 1994). The subclass Copepoda consists of over 250 described families, 2,600 genera, and 21,000 described species classified into ten orders (Walter and Boxshall 2008). Their life histories are diverse; planktonic and benthic copepods are an important ecological link in the aquatic food chain (Gee 1987; Ohman and Hirche 2001), but approximately one third of marine copepod species live as associates, commensals, or parasites on invertebrates and fishes (Humes 1994).

Parasitic copepods are commonly found both on farmed and wild marine finfish (Johnson and Fast 2004). They feed on host mucus, epidermal cells, tissues, and blood, the result of which causes physiological stress, immune dysfunction, impairment of swimming ability, and possibly death (Boxaspen 2006; Costello 2006; Johnson and Fast 2004; Tully and Nolan 2002). Members of the family Caligidae, especially the genera Caligus and Lepeophtheirus, are commonly referred to as sea lice (Costello 2006; Johnson et al. 2004; Pike and Wadsworth 1999). They are the most economically important parasites of the world salmon farming industry and may cause direct and indirect economic losses in the industry of €300 million (US$480 million) annually (Costello 2009). In addition, there is concern that salmon farms elevate the risk of sea lice infections on wild salmon beyond that which naturally occurs and lead to a depression in the abundance of wild salmon stocks (Costello 2006; Heuch et al. 2005; Krkošek et al. 2007a; Krkošek et al. 2007b; Todd et al. 2006).

In the North Atlantic Ocean, Lepeophtheirus salmonis and Caligus elongatus account for the most serious infestations of cultured and wild salmonids (Johnson et al. 2004; Pike and Wadsworth 1999). In the eastern north Pacific Ocean, L. salmonis and Caligus clemensi have been found on farmed Atlantic salmon (Salmo salar) and wild Pacific salmon (Oncorhynchus spp.; Beamish et al. 2009; Beamish et al. 2005; Saksida et al. 2007). While L. salmonis is prevalent in both Atlantic and Pacific coasts, earlier studies suggested that the Pacific and Atlantic populations of L. salmonis are genetically distinct (Tjensvoll et al. 2006; Todd et al. 2004). More recent genomic studies strongly suggest that distinct species of L. salmonis exist in the Pacific and Atlantic Oceans following a separation that occurred from 2.5 to 11 million years ago (Boulding et al. 2009; Yazawa et al. 2008). These parasites are referred to herein as the Pacific and Atlantic forms of L. salmonis, respectively. In the southern hemisphere, Caligus rogercresseyi is the dominant species affecting salmonid aquaculture in Chile where the parasites were found on farmed salmon in 99% of the established cultured cages (Boxshall and Bravo 2000; Carvajal et al. 1998).

Lepeophtheirus and Caligus species are distinguished from each other based on morphological characters (Kabata 1979). The life cycle in L. salmonis has a total of ten developmental stages, while C. elongatus and C. rogercresseyi are similar but appear to lack pre-adult stages (Piasecki and MacKinnon 1995; González and Carvajal 2003). The host range of L. salmonis mainly includes salmonids but the parasite has also been reported from non-salmonid hosts, including sticklebacks, that co-occur with salmon (Jones et al. 2006). In contrast, some Caligus species have a broad host range of salmonids and non-salmonids (Costello 2006; Johnson et al. 2004). Among its salmonid hosts, L. salmonis displays clear preferences, with heaviest infestations and greatest impacts occurring on Atlantic salmon (S. salar) and sea trout (Onchorhynchus trutta) followed by rainbow trout (Onchorhynchus mykiss), chinook (Onchorhynchus tshawytscha), and coho salmon (Onchorhynchus kisutch; Dawson et al. 1997; Fast et al. 2002; Johnson and Albright 1992). In contrast, C. rogercresseyi occurs in higher numbers on caged rainbow trout compared with Atlantic or coho salmon (González et al. 2000). Thus, while L. salmonis and Caligus species exhibit similar parasitic life history strategies, they display considerable differences in morphology, life cycle, and host range.

Another parasite, Lernaeocera branchialis belongs to the copepod family Pennellidae and is distantly related to the caligid copepods, and this species is commonly found on gadoids, particularly Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus) in the North Atlantic Ocean and North Sea (Bricknell et al. 2006; Smith et al. 2007). This parasite has a negative impact on wild gadoids and is a potentially serious pathogen of farmed Atlantic cod (Smith et al. 2007). A compilation of genomic information on parasitic copepods is an important tool for understanding their biology as well as for the study of population genetics and control strategies.

In this study, we report on over 150,000 expressed sequence tags (ESTs) obtained from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs). These ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined and compared to each other and to L. salmonis. These studies show high levels of sequence divergence in nuclear and mtDNA genes. This report describes the production and characteristics of the largest genomic resource for copepods.

Materials and Methods

EST Analysis

Specimens belonging to the Pacific (British Columbia, Canada (BC)) and Atlantic forms of L. salmonis (Norway and New Brunswick, Canada), C. clemensi (BC), C. rogercresseyi (Chile), and L. branchialis (Scotland, UK) were collected and stored at −80°C or in RNAlater (Invitrogen) until RNA extraction. Total RNA was extracted from whole bodies (from various life stages and both sexes) using TRIzol reagent (Invitrogen) and spin-column purified using RNeasy Mini kits (Qiagen). The purified RNAs were then quantified and quality checked by spectrophotometer (NanoDrop Technologies) and agarose gel, respectively. Approximately 1.0–3.0 μg of total RNA was converted into cDNA and normalized and was directionally cloned into pAL 17.3 vector (Evrogen Co.).

Clones from each library were robotically arrayed in 384-well microtiter plates as detailed previously (Koop et al. 2008). Plasmid DNAs were extracted and sequenced on an ABI 3730 DNA analyzer (Applied Biosystems) with M13 forward and M13 reverse primers (L. salmonis and C. rogercresseyi) or with M13 forward and SP6 primers (C. clemensi and L. branchialis). These sequence primers are shown in supplemental Table 1. The resulting ESTs were assembled with CAP3 (Huang and Madan 1999) with default parameters. The assembled total contigs (clusters + singletons) were annotated using RPS-BLAST and BLASTX comparisons with the Conserved Domain Database (CDD) and SwissProt (Bairoch and Apweiler 1996), respectively. The best BLAST match (E value threshold of 1 E−10) was used to identify contigs. Contigs that did not meet this threshold were annotated as “unknown.”

Reference full-length cDNAs (FLcDNAs) were identified as detailed previously (Leong et al. 2010). A single clone containing an entire coding sequence (CDS) for a gene product is considered a reference FLcDNA.

Complete Mitochondrial Genome Sequences of C. clemensi and C. rogercresseyi

The total genomic DNAs were extracted from an adult male C. clemensi and C. rogercresseyi as previously described (Yazawa et al. 2008). A sample placed in 5% Chelex-100 resin (Sigma) solution (5% Chelex-100 resin, 0.2% SDS in TE, with proteinase K (100 μg/ml)) was incubated for 30 min at 55°C, and the proteinase K was then inactivated for 10 min at 90°C. The sequence determination of the complete C. rogercresseyi mt genome was carried out as previously described (Yazawa et al. 2008). The PCR primer sets that were used were designed for 15 fragments (Supplemental Table 1) based on the EST sequences encoding mtDNA. PCR amplification was performed using 1.0 μl of extracted total genomic DNA of C. rogercresseyi with an initial denaturation step of 2 min at 95°C and then 30 cycles as follows: 30 s of denaturation at 95°C, 30 s of annealing at 55°C, and 3 min of extension at 72°C. PCR products were cloned into pCR2.1 vector (TA Cloning Kit, Invitrogen) with the manufacturer’s protocol, and each positive PCR product was sequenced as described above. The entire mt genome for C. clemensi was amplified by a long PCR method for three long fragments (5.4, 5.0, and 3.0 kb) and by PCR as described above for one short fragment (0.8 kb). The three PCR fragments were amplified using the PCR primer sets shown in Supplemental Table 1 and by using Long PCR Enzyme mix (Fermentas) following the manufacturer’s protocol. The long PCR amplification was performed using 100 ng of extracted total genomic DNA of C. clemensi with an initial denaturation step of 2 min at 94°C and then a two-step PCR procedure (40 cycles of 95°C for 10 s and 68°C for 7 min), and 10 min of final extension. The three long PCR products were cloned into pCR-XL-TOPO vector (Invitrogen) with the manufacturer’s protocol, and each positive PCR product was sequenced by primer walking (supplemental Table 1). The one short fragment was cloned into pCR2.1 vector and sequenced as described above.

Protein-coding and rRNA genes of C. clemensi and C. rogercresseyi were identified by alignment with the Pacific L. salmonis mt gene sequences (GenBank: EU288200). The majority of the tRNA genes was identified using tRNAscan-SE 1.21(Lowe and Eddy 1997), using the same parameters as described by Tjensvoll et al. (2005). The remaining tRNA genes were identified based on the sequence homology with L. salmonis tRNA sequences.

Pair-wise Kimura two-parameter (K2P) distances (Kimura 1980) of 16S rRNA and cox1genes for C. clemensi, C. rogercresseyi, and Pacific L. salmonis were calculated in MEGA5 (Tamura et al. 2007), with default settings.

Results and Discussion

EST Analysis and Comparison of the Nuclear Genes

Normalized cDNA libraries were constructed for Pacific L. salmonis, Atlantic L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis. The 114,967 clones obtained from these cDNA libraries (28,032 Pacific L. salmonis, 51,607 Atlantic L. salmonis, 7,680 C. clemensi, 19,200 C. rogercresseyi, and 8,448 L. branchialis) were sequenced with M13 forward and M13 reverse (L. salmonis and C. rogercresseyi) or with M13 forward and SP6 primers (C. clemensi and L. branchialis). A summary of the EST project is shown in Table 1. From these clones, 153,977 high-quality ESTs were obtained from Pacific L. salmonis (49,672 ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs). The average trimmed length of these ESTs was 734 bp. These EST sequences are available in GenBank.

Table 1 Sea lice EST project summary

The 49,672 Pacific L. salmonis ESTs obtained in this study along with 14,994 Pacific L. salmonis ESTs from our previous study (Yazawa et al. 2008) were assembled into 11,922 contigs and 4,186 singletons (16,108 putative transcripts). There is a total of 14,466 putative transcripts for Atlantic L. salmonis, 6,054 for C. clemensi, 11,357 for C. rogercresseyi, and 6,438 for L. branchialis. These putative transcripts were annotated using RPS-BLAST and BLASTX comparisons with the CDD and SwissProt (Bairoch and Apweiler 1996), respectively. The best match (E value threshold of 1 E−10) was used to identify putative transcripts. Of the 16,108 Pacific L. salmonis putative transcripts, 7,157 (44.4%) matched at least one entry in the databases while the others remain unidentified. Similarly, 6,726 (46.5%) Atlantic L. salmonis, 3,775 (62.4%) C. clemensi, 5,830 (51.3%) C. rogercresseyi, and 3,951 (61.4%) L. branchialis putative transcripts have significant BLAST hits (Table 1).

A collection of reference FLcDNA clones is an important resource for identifying genes, determining their structural features and for experimental analysis of gene functions. Possible reference FLcDNAs were defined as having an entire open reading frame (ORF) corresponding to a full-length protein and were identified as described previously (Leong et al. 2010). Using an E value filter of E ≤ 10−5, the top ten SwissProt high-scoring segment pairs (HSPs) from BLASTX for each putative transcript were analyzed in succession to identify the correct ORF. Of the 16,108 Pacific L. salmonis putative transcripts, 1,435 transcripts were identified as possible FLcDNAs. There are 1,086 Atlantic L. salmonis FLcDNAs, 1,223 C. clemensi FLcDNAs, and 1,574 C. rogercresseyi FLcDNAs. These reference FLcDNAs were submitted to NCBI’s FLIC database.

A relational database with an intuitive web interface was developed to process and display the large quantities of EST data, their assemblies, and their associated annotation information (Fig. 1). This interface provides the ability to search using sequence data, identifiers, accession numbers, and descriptive keywords. The BLAST search allows users to perform homology searches with sequences of interest, identifying potential transcripts names, and then visualizing these sequences and EST alignments. These EST contigs have predicted ORFs and BLASTX HSPs displayed in a single view. This database contributes to the identification and analysis of proteins and to the development of microarrays for gene expression analyses.

Fig. 1
figure 1

Screenshot of sea lice EST contig summary and search tools. The top panel allows users to perform homology searches for sequences of interest. The second provides the ability to search using sequence data, identifiers, accession numbers, and descriptive keywords. The third to seventh panels show a summary of the EST clustering results of C. clemensi, C. rogercressyi, Pacific L. salmonis, Atlantic L. salmonis, and L. branchialis, respectively

Sequence similarities and putative transcripts were compared among the nuclear genes of the five copepods (Pacific L. salmonis, Atlantic L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis) by BLASTN for nucleotide (nt) sequences and tBLASTX for amino acid (aa) sequences (Table 2). We previously reported that a total of 155 nuclear genes from Pacific and Atlantic L. salmonis showed an average of 96.8% nt identity over an average of 756 bp (Yazawa et al. 2008). In this study, a total of 8,121 nucleotide and 8,827 translated aa sequences matched between the Pacific and Atlantic L. salmonis putative transcripts. These sequences showed an average of 96% identity at the nt level over an average of 626 bp and 88% at the aa level over an average of 187 aa (Table 2). Nuclear gene sequences were quite different not only between the genera Caligus and Lepeophtheirus (81–82% nt, 70–72% aa identities), but also between the two Caligus species (83% nt, 71% aa identities; Table 2). The range of nuclear gene sequence divergence was quite similar among these species (17–19% nt and 28–30% aa sequence divergences). As expected, nucleotide sequences of L. branchialis, the only species examined from the family Pennellidae, were very different from the caligid sequences: only 4–6% of the total queries (254–405 sequences) matched the nuclear genes of the four other copepods. We speculate that the matched genes are conserved among copepods and therefore we could not determine the divergence between nt sequences of L. branchialis and the four caligid copepods. However, the 2,634–3,375 translated aa sequences of L. branchialis (44–52% of query sequences) did show significant matches with sequences of the four other copepods. These translated aa sequences showed 59–62% identities over averages of 121–132 aa (Table 2). Although these comparisons provide only a very rough estimate of overall sequence similarity, they clearly indicate a high level of sequence divergence among these copepods nuclear genes.

Table 2 Comparison of the Pacific and Atlantic L. salmonis, C. clemensi, C. rogercressyi, and L. branchialis nuclear genes

Mitochondrial Genome Sequences of L. salmonis, C. clemensi, and C. rogercresseyi

Metazoan mt genomes typically range between 15 and 20 kb in size, containing 37 genes: 13 protein-encoding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and a major non-coding region (NCR; Boore 1999). In this study, whole mt genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined. The sizes of the entire mt genomes were 13,440 bp for C. clemensi [Genbank: HQ157566] and 13,468 bp for C. rogercresseyi [Genbank: HQ157565], and thus, these mt genomes are the shortest among 57 crustacean mt genomes (average length: 15,785 bp) reported so far (Genbank: November 2010). There are two reasons for the small size of these mt genomes. First, the major NCRs of the C. clemensi (104 bp) and C. rogercresseyi (129 bp) mt genomes were much shorter than that of L. salmonis (Pacific form, 1,441 bp; Atlantic form, 2,146 bp) and that of other crustaceans (average length, 875 bp), except for that of the amphipod Metacrangonyx longipes (76 bp; Bauzà-Ribot et al. 2009). Second, while both Caligus mt genomes contained the typical set of 12 protein-encoding, 21 tRNA and two rRNA genes found in other animal mt genomes, both mt genomes lacked the PCG, nad4L, and a tRNA gene, trnL 2 (CUN).

Interestingly, the C. clemensi mt genome is adenine and thymine (A + T)-rich (PCG, 74.5%; whole genome, 75.6%) compared to C. rogercresseyi and L. salmonis (PCG, 63.6–64.9%; whole genome, 65.2–66.5%; Supplemental Table 2). In crustaceans, the mt genomic A–T content values range from 60.9% for Ligia oceanica (Isopoda; Kilpert and Podsiadlowski 2006) to 77.8% for Argulus americanus (Branchiura; Lavrov et al. 2004). The reason for the variability in A–T richness within the mitochondrial genome among taxa is not clear.

Like the nuclear genes, the mtDNA gene sequences also exhibited large divergence, not only between L. salmonis and the two Caligus species (66.7–68.8% nt and 64.2–65.4% aa identities), but also between the two Caligus species (68.8% nt and 63.6% aa identities). The range of mtDNA sequence divergence was quite similar among the three caligid copepods. The percent nt and aa identities among the L. salmonis, C. clemensi, and C. rogercresseyi sequences are 63.6–68.8% (Table 3). The cox1 gene is the most conserved PCG among the three mt genomes (79.1–82.6% nt and 91.2–94.1% aa identities), while nad2, nad4, nad5, and nad6 exhibit a large sequence divergence (56.1–62.2% nt and 40.0–51.9% aa identities; Table 3).

Table 3 Comparison of the L. salmonis, C. clemensi, and C. rogercressyi mtDNA genes

Hebert et al. (2003) reported that cox1 divergences among the 13,320 species in the animal kingdom ranged from a low of 0.0% to a high of 53.7% and the mean divergence value of 11.3%. The cox1 divergences in the Crustacea showed the mean species divergence value of 15.4% (Hebert et al. 2003). Interestingly, our present study showed that the cox1 divergences among the three caligid copepods were higher than the mean divergence value of Crustacea. The cox1 interspecific divergence between C. clemensi and C. rogercresseyi is 20.2% and between the genera Caligus and Lepeoptheirus 26.0%. Øines and Schram (2008) compared among the cox fragment (a total 504 aligned base pairs) of 18 caligid copepods and the 16S rRNA fragment (a total of 438 aligned base pairs) of 11 caligid copepods. They found that an average K2P distance of cox1 were 0.218 and those of 16S rRNA were 0.221 (Øines and Schram 2008). In the present study, the K2P distance of cox1 (a total of 1,539 aligned base pairs) among the L. salmonis, C. clemensi, and C. rogercresseyi is 0.202–0.270 (Supplemental Table 3), which is similar to an average K2P distance found by Øines and Schram (2008). However, the 16S rRNA among the three copepods showed a very high genetic distance. The K2P distance of the 16S rRNA (a total of 1,085 aligned base pairs) were 0.333 between C. clemensi and C. rogercresseyi and 0.422 (Supplemental Table 3). These molecular distance values support an ancient separation between C. clemensi and C. rogercresseyi as well as between Lepeoptheirus and Caligus.

In our previous study, a molecular clock based on 16S rRNA and calibrated by copepod data suggested that the forms of L. salmonis existing in the Pacific and Atlantic Oceans evolved from a common ancestor following a separation that occurred from 4.6–11 million years ago (Yazawa et al. 2008). In this study, the molecular estimates of the age of divergence between the L. salmonis (Pacific) and the two Caligus species were calculated based on the 16S rRNA gene using the same method as previously reported (Yazawa et al. 2008). The results suggest that the separation between the L. salmonis (Pacific) and the two Caligus species occurred approximately 45–113 million years ago (Table 4). In addition, the separation between the two Caligus species was estimated to have occurred 37–87 million years ago (Table 4). Salmonids are believed to have evolved from an ancestor in which a whole genome duplication event occurred 25–100 million years ago (Ohno 1970). Thus, our present results suggest that the L. salmonis and C. clemensi have been in existence for 45–106 million years and that parasitic association with salmonids is likely also quite ancient (Table 4).

Table 4 Ranges of 16S rRNA gene divergence based on Kimura two-parameter distance and crustacean molecular clock calibrations

The order of the genes in the two Caligus mt genomes is identical despite extensive sequence divergence. In contrast, the order of genes in the two Caligus mt genomes is quite different from that in the L. salmonis mt genome. The gene arrangement in the region between nad4 and trnL 1 (UUR; approximately 10 kb) is well conserved between L. salmonis and the Caligus species. However, the gene arrangements adjacent to their control regions (CRs) are very distinct, and the Caligus mt genomes show a novel gene arrangement (Fig. 2). The region around the CR is more prone to gene rearrangement in both vertebrate (Macey et al. 1997) and invertebrate (Dowton and Austin 1999) mt genomes. In the L. salmonis mt genomes, the region between trnK 2 and trnR (six tRNA and atp6 genes) is in a row (Tjensvoll et al. 2005; Yazawa et al. 2008). However, in the Caligus mt genomes, this region is separated by rrnS-nad6-trnA-trnK 1 -trnQ-trnT-cytb-CR, and divided into trnK 2 -trnN-trnG-trnV and atp6-trnY-trnR (trnY also had a position change; Fig. 2). As mentioned above, the nad4L and trnL 2 (CUN) genes are absent in the Caligus mt genomes. These two genes normally reside in this region and have probably been lost due to rearrangement. It is likely that this rearrangement event also has led to the trimming of their CRs in the two Caligus mt genomes.

Fig. 2
figure 2

Genomic organization of the C. clemensi (13,440 bp) and the C. rogercressyi (13,468 bp) mt genomes. The complete mt genomes of the Atlantic (15,445 bp) and Pacific (16,148 bp) L. salmonis were previously reported, and these mt genomes are identical in gene organization (Tjensvoll et al. 2005; Yazawa et al. 2008). Boxes represent mtDNA genes. tRNA genes are denoted by the single letter amino acid code, and an underline indicates tRNA genes located on negative strand. rrnL and rrnS refer to 16S and 12S rRNA; cox1, cox2, and cox3 refer to cytochrome oxidase subunit I, II, and III; cob refers to cytochrome b; nad16 and nad4L refer to NADH dehydrogenase subunits 1–6 and 4 L, atp6 and atp8 refer to ATP synthase subunits 6 and 8, respectively, and CR refers to control region. Transcription directions for the protein-coding and rRNA genes are shown by arrowheads

In the mt genomes of most animals, nad4L and atp8 are located together with nad4 and atp6, respectively (nad4L-nad4 and atp8-atp6), and nad4L- nad4 and atp8-atp6 are translated from a single mRNA (Amalric et al. 1978; Berthier et al. 1986). In contrast, several genes separate nad4 and nad4L in the mt genomes of L. salmonis and in the mt genomes of all copepods characterized so far: Tigriopus japonicas (Machida et al. 2002), Tigriopus californicus (Burton et al. 2007), Paracyclopina nana (Ki et al. 2009), and the partially sequenced mt genomes of Eucalanus bungii and Neocalanus cristatus (Machida et al. 2004). The atp6 and atp8 are also separated in the two Caligus species and in L. salmonis (Fig. 2). In addition, it has been reported that atp8 is absent in the mt genome of P. nana (Ki et al. 2009). Thus, it is most likely that these separations of nad4-nad4L and atp6-atp8 occurred during copepod evolution and led to the loss of nad4L in the two Caligus species and to the loss of atp8 in the P. nana.

In summary, the mtDNA genes of the two Caligus species showed high levels of sequence divergence (Table 3). The A+T content is also quite different between the two Caligus mt genomes (Supplemental Table 2). In addition, the orders of the genes in the two Caligus mt genomes are identical to each other, but different from the order in the L. salmonis mt genome (Fig. 2).

Sea Lice as Ectoparasite Model System

Since parasites by definition depend on a live host for growth and survival, in vitro culture system is typically very difficult to establish. Although procedures for experimental infections are established for some parasitic species, manipulation of the parasites may still be very difficult since removing them from the host is lethal for the parasite in general. Sea lice have life cycle features that make them promising as a model system. The life cycle features, consisting of both free-living larval developmental stages and pre-adults and adult stages that can move unrestricted on host surface, enable manipulation of these parasites. For L. salmonis, recent advances in larval production systems and infection procedures (see Hamre et al. 2009) have been crucial for the establishment of defined laboratory strains of the salmon louse with different properties (e.g., drug-resistant strains, inbred strains). Stable and predictable production conditions further enables specific breeding to create various types of hybrids (e.g., susceptible and drug-resistant family groups). The improvement of rearing facilities has been a crucial facilitator for establishment of RNAi in L. salmonis (Dalvin et al. 2009). Systemic RNAi is easily achieved in pre-adult or adult lice by injection of dsRNA in the animal. In addition, soaking free-living larval stages (e.g., copepodids) in dsRNA enables RNAi in copepodids (Campell et al. 2009). In addition, the genomes of both the Pacific and Atlantic variants of L. salmonis are currently being sequenced and together with the present cDNA resources this will open up for a new avenue in sea lice research. There is a wide diversity of arthropod parasites and good experimental parasite model systems are scarce, and we anticipate that experimental studies on salmon louse and other sea lice species will contribute to increase our knowledge about ectoparasites in general, particularly when more parasite genomes become available.

Conclusions

We sequenced over 150,000 ESTs from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs; Table 1). A relational database with an intuitive web interface was developed to process and display the large quantities of EST data, their assemblies and associated annotation information, as well as possible full-length gene information (Fig. 1). This database provides a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic resource represents the largest compilation of any copepod species and provides the material basis for the development of a 38K microarray that can be used in conjunction with our existing salmon 44K microarray to study host–parasite interactions at the molecular level.

The nuclear genes showed a high level of sequence divergence among the caligid copepods examined: L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis (Table 2). In addition, whole mt genome sequences of two Caligus species, C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp), were determined and compared. The L. salmonis, C. clemensi, and C. rogercresseyi mtDNA genes also exhibited extensive sequence divergence, ranging among these species from 66.7 to 68.8% nt and from 63.6% to 65.4% aa identities (Table 3). Both nuclear and mtDNA genes showed very high levels of sequence divergence between these ectoparastic copepods which suggested that they have been in existence for 37–113 million years and that parasitic association with marine organisms is likely also quite ancient. However, while the order of the genes in the two Caligus mt genomes is the same, they are different from L. salmonis (Fig. 2). The large sequence divergence observed among these copepods may help to explain an extensive variety of morphology, life history, and host association in copepods.