Genomic Resources for Sea Lice: Analysis of ESTs and Mitochondrial Genomes
- First Online:
- Cite this article as:
- Yasuike, M., Leong, J., Jantzen, S.G. et al. Mar Biotechnol (2012) 14: 155. doi:10.1007/s10126-011-9398-z
Sea lice are common parasites of both farmed and wild salmon. Salmon farming constitutes an important economic market in North America, South America, and Northern Europe. Infections with sea lice can result in significant production losses. A compilation of genomic information on different genera of sea lice is an important resource for understanding their biology as well as for the study of population genetics and control strategies. We report on over 150,000 expressed sequence tags (ESTs) from five different species (Pacific Lepeophtheirus salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), Caligus clemensi (14,821 ESTs), Caligus rogercresseyi (32,135 ESTs), and Lernaeocera branchialis (16,441 ESTs)). For each species, ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp) were determined and compared to L. salmonis. Both nuclear and mtDNA genes show very high levels of sequence divergence between these ectoparastic copepods suggesting that the different species of sea lice have been in existence for 37–113 million years and that parasitic association with salmonids is also quite ancient. Our ESTs and mtDNA data provide a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic information provides the material basis for the development of a 38K sea louse microarray that can be used in conjunction with our existing 44K salmon microarray to study host–parasite interactions at the molecular level. This report represents the largest genomic resource for any copepod species to date.
KeywordsLepeophtheirus salmonisCaligus clemensiC. rogercresseyiLernaeocera branchialisExpressed sequence tags (ESTs)Mitochondrial genome
Copepods (Copepoda) are a group of small crustaceans found in various aquatic environments and they are described as the most abundant metazoans on earth (Humes 1994). The subclass Copepoda consists of over 250 described families, 2,600 genera, and 21,000 described species classified into ten orders (Walter and Boxshall 2008). Their life histories are diverse; planktonic and benthic copepods are an important ecological link in the aquatic food chain (Gee 1987; Ohman and Hirche 2001), but approximately one third of marine copepod species live as associates, commensals, or parasites on invertebrates and fishes (Humes 1994).
Parasitic copepods are commonly found both on farmed and wild marine finfish (Johnson and Fast 2004). They feed on host mucus, epidermal cells, tissues, and blood, the result of which causes physiological stress, immune dysfunction, impairment of swimming ability, and possibly death (Boxaspen 2006; Costello 2006; Johnson and Fast 2004; Tully and Nolan 2002). Members of the family Caligidae, especially the genera Caligus and Lepeophtheirus, are commonly referred to as sea lice (Costello 2006; Johnson et al. 2004; Pike and Wadsworth 1999). They are the most economically important parasites of the world salmon farming industry and may cause direct and indirect economic losses in the industry of €300 million (US$480 million) annually (Costello 2009). In addition, there is concern that salmon farms elevate the risk of sea lice infections on wild salmon beyond that which naturally occurs and lead to a depression in the abundance of wild salmon stocks (Costello 2006; Heuch et al. 2005; Krkošek et al. 2007a; Krkošek et al. 2007b; Todd et al. 2006).
In the North Atlantic Ocean, Lepeophtheirus salmonis and Caligus elongatus account for the most serious infestations of cultured and wild salmonids (Johnson et al. 2004; Pike and Wadsworth 1999). In the eastern north Pacific Ocean, L. salmonis and Caligus clemensi have been found on farmed Atlantic salmon (Salmo salar) and wild Pacific salmon (Oncorhynchus spp.; Beamish et al. 2009; Beamish et al. 2005; Saksida et al. 2007). While L. salmonis is prevalent in both Atlantic and Pacific coasts, earlier studies suggested that the Pacific and Atlantic populations of L. salmonis are genetically distinct (Tjensvoll et al. 2006; Todd et al. 2004). More recent genomic studies strongly suggest that distinct species of L. salmonis exist in the Pacific and Atlantic Oceans following a separation that occurred from 2.5 to 11 million years ago (Boulding et al. 2009; Yazawa et al. 2008). These parasites are referred to herein as the Pacific and Atlantic forms of L. salmonis, respectively. In the southern hemisphere, Caligus rogercresseyi is the dominant species affecting salmonid aquaculture in Chile where the parasites were found on farmed salmon in 99% of the established cultured cages (Boxshall and Bravo 2000; Carvajal et al. 1998).
Lepeophtheirus and Caligus species are distinguished from each other based on morphological characters (Kabata 1979). The life cycle in L. salmonis has a total of ten developmental stages, while C. elongatus and C. rogercresseyi are similar but appear to lack pre-adult stages (Piasecki and MacKinnon 1995; González and Carvajal 2003). The host range of L. salmonis mainly includes salmonids but the parasite has also been reported from non-salmonid hosts, including sticklebacks, that co-occur with salmon (Jones et al. 2006). In contrast, some Caligus species have a broad host range of salmonids and non-salmonids (Costello 2006; Johnson et al. 2004). Among its salmonid hosts, L. salmonis displays clear preferences, with heaviest infestations and greatest impacts occurring on Atlantic salmon (S. salar) and sea trout (Onchorhynchus trutta) followed by rainbow trout (Onchorhynchus mykiss), chinook (Onchorhynchus tshawytscha), and coho salmon (Onchorhynchus kisutch; Dawson et al. 1997; Fast et al. 2002; Johnson and Albright 1992). In contrast, C. rogercresseyi occurs in higher numbers on caged rainbow trout compared with Atlantic or coho salmon (González et al. 2000). Thus, while L. salmonis and Caligus species exhibit similar parasitic life history strategies, they display considerable differences in morphology, life cycle, and host range.
Another parasite, Lernaeocera branchialis belongs to the copepod family Pennellidae and is distantly related to the caligid copepods, and this species is commonly found on gadoids, particularly Atlantic cod (Gadus morhua) and haddock (Melanogrammus aeglefinus) in the North Atlantic Ocean and North Sea (Bricknell et al. 2006; Smith et al. 2007). This parasite has a negative impact on wild gadoids and is a potentially serious pathogen of farmed Atlantic cod (Smith et al. 2007). A compilation of genomic information on parasitic copepods is an important tool for understanding their biology as well as for the study of population genetics and control strategies.
In this study, we report on over 150,000 expressed sequence tags (ESTs) obtained from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs). These ESTs were assembled into complete or partial genes and annotated by comparisons to known proteins in public databases. In addition, whole mitochondrial (mt) genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined and compared to each other and to L. salmonis. These studies show high levels of sequence divergence in nuclear and mtDNA genes. This report describes the production and characteristics of the largest genomic resource for copepods.
Materials and Methods
Specimens belonging to the Pacific (British Columbia, Canada (BC)) and Atlantic forms of L. salmonis (Norway and New Brunswick, Canada), C. clemensi (BC), C. rogercresseyi (Chile), and L. branchialis (Scotland, UK) were collected and stored at −80°C or in RNAlater (Invitrogen) until RNA extraction. Total RNA was extracted from whole bodies (from various life stages and both sexes) using TRIzol reagent (Invitrogen) and spin-column purified using RNeasy Mini kits (Qiagen). The purified RNAs were then quantified and quality checked by spectrophotometer (NanoDrop Technologies) and agarose gel, respectively. Approximately 1.0–3.0 μg of total RNA was converted into cDNA and normalized and was directionally cloned into pAL 17.3 vector (Evrogen Co.).
Clones from each library were robotically arrayed in 384-well microtiter plates as detailed previously (Koop et al. 2008). Plasmid DNAs were extracted and sequenced on an ABI 3730 DNA analyzer (Applied Biosystems) with M13 forward and M13 reverse primers (L. salmonis and C. rogercresseyi) or with M13 forward and SP6 primers (C. clemensi and L. branchialis). These sequence primers are shown in supplemental Table 1. The resulting ESTs were assembled with CAP3 (Huang and Madan 1999) with default parameters. The assembled total contigs (clusters + singletons) were annotated using RPS-BLAST and BLASTX comparisons with the Conserved Domain Database (CDD) and SwissProt (Bairoch and Apweiler 1996), respectively. The best BLAST match (E value threshold of 1 E−10) was used to identify contigs. Contigs that did not meet this threshold were annotated as “unknown.”
Reference full-length cDNAs (FLcDNAs) were identified as detailed previously (Leong et al. 2010). A single clone containing an entire coding sequence (CDS) for a gene product is considered a reference FLcDNA.
Complete Mitochondrial Genome Sequences of C. clemensi and C. rogercresseyi
The total genomic DNAs were extracted from an adult male C. clemensi and C. rogercresseyi as previously described (Yazawa et al. 2008). A sample placed in 5% Chelex-100 resin (Sigma) solution (5% Chelex-100 resin, 0.2% SDS in TE, with proteinase K (100 μg/ml)) was incubated for 30 min at 55°C, and the proteinase K was then inactivated for 10 min at 90°C. The sequence determination of the complete C. rogercresseyi mt genome was carried out as previously described (Yazawa et al. 2008). The PCR primer sets that were used were designed for 15 fragments (Supplemental Table 1) based on the EST sequences encoding mtDNA. PCR amplification was performed using 1.0 μl of extracted total genomic DNA of C. rogercresseyi with an initial denaturation step of 2 min at 95°C and then 30 cycles as follows: 30 s of denaturation at 95°C, 30 s of annealing at 55°C, and 3 min of extension at 72°C. PCR products were cloned into pCR2.1 vector (TA Cloning Kit, Invitrogen) with the manufacturer’s protocol, and each positive PCR product was sequenced as described above. The entire mt genome for C. clemensi was amplified by a long PCR method for three long fragments (5.4, 5.0, and 3.0 kb) and by PCR as described above for one short fragment (0.8 kb). The three PCR fragments were amplified using the PCR primer sets shown in Supplemental Table 1 and by using Long PCR Enzyme mix (Fermentas) following the manufacturer’s protocol. The long PCR amplification was performed using 100 ng of extracted total genomic DNA of C. clemensi with an initial denaturation step of 2 min at 94°C and then a two-step PCR procedure (40 cycles of 95°C for 10 s and 68°C for 7 min), and 10 min of final extension. The three long PCR products were cloned into pCR-XL-TOPO vector (Invitrogen) with the manufacturer’s protocol, and each positive PCR product was sequenced by primer walking (supplemental Table 1). The one short fragment was cloned into pCR2.1 vector and sequenced as described above.
Protein-coding and rRNA genes of C. clemensi and C. rogercresseyi were identified by alignment with the Pacific L. salmonis mt gene sequences (GenBank: EU288200). The majority of the tRNA genes was identified using tRNAscan-SE 1.21(Lowe and Eddy 1997), using the same parameters as described by Tjensvoll et al. (2005). The remaining tRNA genes were identified based on the sequence homology with L. salmonis tRNA sequences.
Pair-wise Kimura two-parameter (K2P) distances (Kimura 1980) of 16S rRNA and cox1genes for C. clemensi, C. rogercresseyi, and Pacific L. salmonis were calculated in MEGA5 (Tamura et al. 2007), with default settings.
Results and Discussion
EST Analysis and Comparison of the Nuclear Genes
Sea lice EST project summary
L. salmonis (P)a
L. salmonis (A)b
Number of clonesc
Number of sequencesd
Average trimmed EST length (bp)f
Number of contigsg
Number of singletons
Number of putative transcripts
Maximum contig size (no. of ESTs)
Average contig size (no. of ESTs)
Number of transcripts with BLAST hitsh
Percent with significant BLAST hits
The 49,672 Pacific L. salmonis ESTs obtained in this study along with 14,994 Pacific L. salmonis ESTs from our previous study (Yazawa et al. 2008) were assembled into 11,922 contigs and 4,186 singletons (16,108 putative transcripts). There is a total of 14,466 putative transcripts for Atlantic L. salmonis, 6,054 for C. clemensi, 11,357 for C. rogercresseyi, and 6,438 for L. branchialis. These putative transcripts were annotated using RPS-BLAST and BLASTX comparisons with the CDD and SwissProt (Bairoch and Apweiler 1996), respectively. The best match (E value threshold of 1 E−10) was used to identify putative transcripts. Of the 16,108 Pacific L. salmonis putative transcripts, 7,157 (44.4%) matched at least one entry in the databases while the others remain unidentified. Similarly, 6,726 (46.5%) Atlantic L. salmonis, 3,775 (62.4%) C. clemensi, 5,830 (51.3%) C. rogercresseyi, and 3,951 (61.4%) L. branchialis putative transcripts have significant BLAST hits (Table 1).
A collection of reference FLcDNA clones is an important resource for identifying genes, determining their structural features and for experimental analysis of gene functions. Possible reference FLcDNAs were defined as having an entire open reading frame (ORF) corresponding to a full-length protein and were identified as described previously (Leong et al. 2010). Using an E value filter of E ≤ 10−5, the top ten SwissProt high-scoring segment pairs (HSPs) from BLASTX for each putative transcript were analyzed in succession to identify the correct ORF. Of the 16,108 Pacific L. salmonis putative transcripts, 1,435 transcripts were identified as possible FLcDNAs. There are 1,086 Atlantic L. salmonis FLcDNAs, 1,223 C. clemensi FLcDNAs, and 1,574 C. rogercresseyi FLcDNAs. These reference FLcDNAs were submitted to NCBI’s FLIC database.
Comparison of the Pacific and Atlantic L. salmonis, C. clemensi, C. rogercressyi, and L. branchialis nuclear genes
No. of queries
No. of matches
Percentage with matchb
Standard deviation for length
Standard deviation for% identities
Average positive AAs
Atlantic form L. salmonis vs Pacific form L. salmonis
C. clemensi vs Pacific form L. salmonis
C. clemensi vs Atlantic form L. salmonis
C. clemensi vs C. rogercressyi
C. clemensi vs L. branchialis
C. rogercressyi vs Pacific form L. salmonis
C. rogercressyi vs Atlantic form L. salmonis
L. branchialis vs Pacific form L. salmonis
L. branchialis vs Atlantic form L. salmonis
L. branchialis vs C. rogercressyi
Mitochondrial Genome Sequences of L. salmonis, C. clemensi, and C. rogercresseyi
Metazoan mt genomes typically range between 15 and 20 kb in size, containing 37 genes: 13 protein-encoding genes (PCGs), 22 transfer RNA (tRNA) genes, two ribosomal RNA (rRNA) genes and a major non-coding region (NCR; Boore 1999). In this study, whole mt genome sequences of two Caligus species, C. clemensi and C. rogercresseyi, were determined. The sizes of the entire mt genomes were 13,440 bp for C. clemensi [Genbank: HQ157566] and 13,468 bp for C. rogercresseyi [Genbank: HQ157565], and thus, these mt genomes are the shortest among 57 crustacean mt genomes (average length: 15,785 bp) reported so far (Genbank: November 2010). There are two reasons for the small size of these mt genomes. First, the major NCRs of the C. clemensi (104 bp) and C. rogercresseyi (129 bp) mt genomes were much shorter than that of L. salmonis (Pacific form, 1,441 bp; Atlantic form, 2,146 bp) and that of other crustaceans (average length, 875 bp), except for that of the amphipod Metacrangonyx longipes (76 bp; Bauzà-Ribot et al. 2009). Second, while both Caligus mt genomes contained the typical set of 12 protein-encoding, 21 tRNA and two rRNA genes found in other animal mt genomes, both mt genomes lacked the PCG, nad4L, and a tRNA gene, trnL2 (CUN).
Interestingly, the C. clemensi mt genome is adenine and thymine (A + T)-rich (PCG, 74.5%; whole genome, 75.6%) compared to C. rogercresseyi and L. salmonis (PCG, 63.6–64.9%; whole genome, 65.2–66.5%; Supplemental Table 2). In crustaceans, the mt genomic A–T content values range from 60.9% for Ligia oceanica (Isopoda; Kilpert and Podsiadlowski 2006) to 77.8% for Argulus americanus (Branchiura; Lavrov et al. 2004). The reason for the variability in A–T richness within the mitochondrial genome among taxa is not clear.
Comparison of the L. salmonis, C. clemensi, and C. rogercressyi mtDNA genes
In nucleic sequence (%)
In deduced amino acid sequence (%)
Pacific form L. salmonis vs C. clemensi
Pacific form L. salmonis vs C. rogercressyi
C. clemensi vs C. rogercressyi
Atlantic form L. salmonis vs Pacific form L. salmonis
Pacific form L. salmonis vs C. clemensi
Pacific form L. salmonis vs C. rogercressyi
C. clemensi vs C. rogercressyi
Atlantic form L. salmonis vs Pacific form L. salmonis
Hebert et al. (2003) reported that cox1 divergences among the 13,320 species in the animal kingdom ranged from a low of 0.0% to a high of 53.7% and the mean divergence value of 11.3%. The cox1 divergences in the Crustacea showed the mean species divergence value of 15.4% (Hebert et al. 2003). Interestingly, our present study showed that the cox1 divergences among the three caligid copepods were higher than the mean divergence value of Crustacea. The cox1 interspecific divergence between C. clemensi and C. rogercresseyi is 20.2% and between the genera Caligus and Lepeoptheirus 26.0%. Øines and Schram (2008) compared among the cox fragment (a total 504 aligned base pairs) of 18 caligid copepods and the 16S rRNA fragment (a total of 438 aligned base pairs) of 11 caligid copepods. They found that an average K2P distance of cox1 were 0.218 and those of 16S rRNA were 0.221 (Øines and Schram 2008). In the present study, the K2P distance of cox1 (a total of 1,539 aligned base pairs) among the L. salmonis, C. clemensi, and C. rogercresseyi is 0.202–0.270 (Supplemental Table 3), which is similar to an average K2P distance found by Øines and Schram (2008). However, the 16S rRNA among the three copepods showed a very high genetic distance. The K2P distance of the 16S rRNA (a total of 1,085 aligned base pairs) were 0.333 between C. clemensi and C. rogercresseyi and 0.422 (Supplemental Table 3). These molecular distance values support an ancient separation between C. clemensi and C. rogercresseyi as well as between Lepeoptheirus and Caligus.
Ranges of 16S rRNA gene divergence based on Kimura two-parameter distance and crustacean molecular clock calibrations
Divergence Range (Myr)
Pacific form L. salmonis vs. C. clemensi
Pacific form L. salmonis vs. C. rogercressyi
C. clemensi vs. C. rogercressyi
In the mt genomes of most animals, nad4L and atp8 are located together with nad4 and atp6, respectively (nad4L-nad4 and atp8-atp6), and nad4L- nad4 and atp8-atp6 are translated from a single mRNA (Amalric et al. 1978; Berthier et al. 1986). In contrast, several genes separate nad4 and nad4L in the mt genomes of L. salmonis and in the mt genomes of all copepods characterized so far: Tigriopus japonicas (Machida et al. 2002), Tigriopus californicus (Burton et al. 2007), Paracyclopina nana (Ki et al. 2009), and the partially sequenced mt genomes of Eucalanus bungii and Neocalanus cristatus (Machida et al. 2004). The atp6 and atp8 are also separated in the two Caligus species and in L. salmonis (Fig. 2). In addition, it has been reported that atp8 is absent in the mt genome of P. nana (Ki et al. 2009). Thus, it is most likely that these separations of nad4-nad4L and atp6-atp8 occurred during copepod evolution and led to the loss of nad4L in the two Caligus species and to the loss of atp8 in the P. nana.
In summary, the mtDNA genes of the two Caligus species showed high levels of sequence divergence (Table 3). The A+T content is also quite different between the two Caligus mt genomes (Supplemental Table 2). In addition, the orders of the genes in the two Caligus mt genomes are identical to each other, but different from the order in the L. salmonis mt genome (Fig. 2).
Sea Lice as Ectoparasite Model System
Since parasites by definition depend on a live host for growth and survival, in vitro culture system is typically very difficult to establish. Although procedures for experimental infections are established for some parasitic species, manipulation of the parasites may still be very difficult since removing them from the host is lethal for the parasite in general. Sea lice have life cycle features that make them promising as a model system. The life cycle features, consisting of both free-living larval developmental stages and pre-adults and adult stages that can move unrestricted on host surface, enable manipulation of these parasites. For L. salmonis, recent advances in larval production systems and infection procedures (see Hamre et al. 2009) have been crucial for the establishment of defined laboratory strains of the salmon louse with different properties (e.g., drug-resistant strains, inbred strains). Stable and predictable production conditions further enables specific breeding to create various types of hybrids (e.g., susceptible and drug-resistant family groups). The improvement of rearing facilities has been a crucial facilitator for establishment of RNAi in L. salmonis (Dalvin et al. 2009). Systemic RNAi is easily achieved in pre-adult or adult lice by injection of dsRNA in the animal. In addition, soaking free-living larval stages (e.g., copepodids) in dsRNA enables RNAi in copepodids (Campell et al. 2009). In addition, the genomes of both the Pacific and Atlantic variants of L. salmonis are currently being sequenced and together with the present cDNA resources this will open up for a new avenue in sea lice research. There is a wide diversity of arthropod parasites and good experimental parasite model systems are scarce, and we anticipate that experimental studies on salmon louse and other sea lice species will contribute to increase our knowledge about ectoparasites in general, particularly when more parasite genomes become available.
We sequenced over 150,000 ESTs from Pacific L. salmonis (49,672 new ESTs in addition to 14,994 previously reported ESTs), Atlantic L. salmonis (57,349 ESTs), C. clemensi (14,821 ESTs), C. rogercresseyi (32,135 ESTs), and L. branchialis (16,441 ESTs; Table 1). A relational database with an intuitive web interface was developed to process and display the large quantities of EST data, their assemblies and associated annotation information, as well as possible full-length gene information (Fig. 1). This database provides a novel resource for the study of sea louse biology, population genetics, and control strategies. This genomic resource represents the largest compilation of any copepod species and provides the material basis for the development of a 38K microarray that can be used in conjunction with our existing salmon 44K microarray to study host–parasite interactions at the molecular level.
The nuclear genes showed a high level of sequence divergence among the caligid copepods examined: L. salmonis, C. clemensi, C. rogercresseyi, and L. branchialis (Table 2). In addition, whole mt genome sequences of two Caligus species, C. clemensi (13,440 bp) and C. rogercresseyi (13,468 bp), were determined and compared. The L. salmonis, C. clemensi, and C. rogercresseyi mtDNA genes also exhibited extensive sequence divergence, ranging among these species from 66.7 to 68.8% nt and from 63.6% to 65.4% aa identities (Table 3). Both nuclear and mtDNA genes showed very high levels of sequence divergence between these ectoparastic copepods which suggested that they have been in existence for 37–113 million years and that parasitic association with marine organisms is likely also quite ancient. However, while the order of the genes in the two Caligus mt genomes is the same, they are different from L. salmonis (Fig. 2). The large sequence divergence observed among these copepods may help to explain an extensive variety of morphology, life history, and host association in copepods.
This project (GiLS—Genomics in Lice and Salmon) was supported by Genome BC, Microtek Intl., Marine Harvest, Mainstream Canada, Greig Seafoods, and the University of Victoria. We would like to thank Rob Holt (Head of Sequencing, Genome Sciences Centre, Vancouver, BC, Canada), Richard Moore (Sequencing Group Leader, Genome Sciences Centre), Sarah Munro, Mike Mayo, and Susan Wagner (Genome Sciences Centre) for plating and sequencing. We also would like to thank John Burka (University of P.E.I., Canada), Frank Nilsen, and Heidi Kongshaug (University of Bergen, Norway) for Atlantic forms of L. salmonis; the Salmones Maullin Company (Chile) for C. rogercresseyi; Brendan Conners (Salmon Coast Field Station, Simoom Sound, BC, Canada) for C. clemensi; and James Bron and Sarah Barker (University of Stirling, Scotland, UK) for L. branchialis.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.