Background

The typical insect mitochondrial (mt) genome is a circular, double-stranded DNA molecule of about 12–20 kb in length that contains 37 genes: 13 protein-coding genes (PCGs), 22 transfer RNAs (tRNA), and two ribosomal RNAs (rRNA) [1, 2]. However, some lineages of parasitic lice (Insecta: Phthiraptera) are notable exceptions. For example, several groups show extensively fragmented mt genomes, where the genes are separated onto multiple circular chromosomes, including in the families Haematopinidae [3], Hoplopleuridae [4, 5], Menoponidae [6], Pediculidae [7,8,9], Polyplacidae [10], and Trichodectidae [11]. Parasitic lice are currently divided into four suborders, namely Ischnocera, Amblycera, Rhynchophthirina, and Anoplura, based on morphological structure.

Lice are permanent, obligate, and often host-specific ectoparasites commonly found on birds and mammals. The suborder Ischnocera (approximately 3120 species) is currently divided into two families, Philopteridae (approximately 2600 species) and Trichodectidae [12]. Complete mt genomes of only a limited number of philopterid species have been sequenced: Bothriometopus macrocnemis [13], Campanulotes bidentatus compar [14], Campanulotes compar [11], Coloceras sp. [15], Falcolipeurus quadripustulatus [11], Ibidoecus bisignatus [15], Columbina picui, Columbina cruziana, and Columbicola columbae [16]. These studies have found extensive gene rearrangements in philopterid mt genomes. A recent report has demonstrated that highly fragmented, mt minicircles are present in four species of the genus Columbicola [16], indicating that fragmented mt genomes are more prevalent in parasitic lice than previously hypothesized [11]. These studies demonstrate that our knowledge of the structure in mt genomes of bird lice is far from comprehensive. Additional data is needed to understand the pattern and mechanisms of genome fragmentation and rearrangement in bird lice.

Owing to maternal inheritance, relatively high evolution rate, conserved gene components, and low rate of recombination, the mt genome has been widely used as a genetic marker for comparative, evolutionary, and phylogenetic analysis at different taxonomic levels [17, 18]. In this study, we (i) characterize the complete mt genome sequences of F. suturalis from the tawny eagle, (ii) compare it with that of F. quadripustulatus from the vulture, and (iii) assess the phylogenetic position of Falcolipeurus within the Philopteridae.

Methods

Sample collection and DNA extraction

Adult samples of F. suturalis were taken from a tawny eagle Aquila rapax in the Beijing Wildlife Rescue Center, China. Lice were washed twice with sterile physiological saline solution (0.85%), and initial identification as F. suturalis made based on morphology and host species (Fig. 1) [19], and then stored in 95% (v/v) ethanol at −40 °C. Total genomic DNA was extracted from 60 individual lice (30 females and 30 males) using the DNeasy Tissue Kit (Promega, Madison, USA) following manufacturer instructions. Two pairs of primers [20], mtd6-mtd11 and 12SA-12SB, were used to amplify fragments of cox1 (600 bp) and rrnS (350 bp) genes, respectively, for use as assembly baits [4].

Fig. 1
figure 1

Female (a) and male (b) Falcolipeurus suturalis (dorsal side and ventral side). 1: head, 2: antenna, 3: thorax, 4: legs, 5: abdomen

Sequencing, assembling and verification

The quality of extracted genomic DNA was tested by agarose gel electrophoresis [21] and DNA concentration were quantified by Qubit 2.0 Fluorometer (Thermo Scientific). A genomic DNA library (350 bp inserts) was prepared and sequenced by Novogene Bioinformatics Technology Co., Ltd. (Tianjin, China) using Illumina HiSeq 2500 (250 bp paired-end reads). Raw reads were filtered with PRINSEQ [22]. Preliminary cox1 and rrnS sequences of F. suturalis were used as initial references for assembly in Geneious v 11.1.5 (minimum overlap identity = 99%, minimum overlap = 150 bp, maximal gap size = 5 bp) [23]. Genome size and circular organization were verified by long PCR using four pairs of specific primers (Additional file 1: Table S1; Additional file 2: Figure S1).

Annotation and sequence analysis

Protein-coding and rRNA genes were identified by alignment to homologous genes of previously sequenced mt genome of the vulture louse F. quadripustulatus [11] using the MAFFT v7.122 software [24]. tRNA genes were identified using ARWEN [25] and tRNAscan-SE [26], with manual adjustment. Annotated mt genomes were illustrated using the visualize module of MitoZ [27]. Nucleotide composition and amino acid sequences of each protein-coding genes and codon usage were analyzed using MEGA v6.0 [28]. Asymmetry of base composition was calculated as the following formula: AT-skew = (AT)/(A+T), GC-skew = (GC)/(G+C) [29].

Phylogenetic analysis

A total of 23 mt genomes from lice, including ten species from the Philopteridae, three species from the Trichodectidae, one elephant louse, Haematomyzus elephantis, and eight species of sucking lice, were used for phylogenetic analysis [3, 5, 7, 9,10,11,12,13,14,15,16, 20, 30,31,32], with one wallaby louse, Heterodoxus macropus (GenBank: AF270939), used as an outgroup [33]. Amino acid sequences of 12 protein-coding genes (except for nad2 because it is unidentified in the H. elephantis mt genome) were aligned individually using MAFFT [23]. Alignments of the individual genes were concatenated into a single dataset. Ambiguously aligned areas were removed by Gblocks 0.91b with the option of less stringent selection [34], and subjected to phylogenetic analyses under Bayesian inference (BI) and maximum likelihood (ML). BI was conducted with four independent Markov chains run for 5,000,000 metropolis-coupled MCMC generations, sampling trees every 500 generations in MrBayes v3.2.7 [35]. The initial 25% (2500) trees of each MCMC were discarded as burn-in and the majority-rule consensus tree used to calculate Bayesian posterior probabilities (Bpp). For ML analysis, the alignment was partitioned by gene, and bootstrapping frequencies (Bf) performed using the rapid bootstrapping option with 100 iterations. The MtART (all 12 genes) model was selected as the most suitable model of evolution by ProtTest v2.4 based on the Akaike information criterion (AIC) [36]. ML analyses were computed using RAxML v2.2.3 [37]. Phylograms were drawn using FigTree v.1.31.

Results and discussion

Identity of the eagle louse F. suturalis

Seven louse species (F. suturalis, Degeeriella fulva, Degeeriella aquilarum, Nosopon chanabense, Colpocephalum impressum, Laemobothrion Laemobothrion vulturis, and Laemobothrion Laemobothrion maximum) can parasitize the tawny eagle A. rapax (http://phthiraptera.info/category/avian/aves/falconiformes/accipitridae/aquila/aquila-rapax).

F. suturalis can be identified by the following morphological characters: (1) body slender in deep dark brown colour; (2) head longer than wide with pointed anterior and slightly broad posterior; (3) dorsally, 3 dark brown spots on each side of preantennal area and 1 dark brown horizontal stripe near antennal area; (4) antennae sexually dimorphic, male antennae stout and long on segment 1, female antenna slender and short on all 5 segments; (5) thorax widened from top to bottom; (6) dorsal thorax with 1 pronotum and 2 pteronotums, pronotum larger than pteronotum, 1 stout short seta on posterior margin of each pteronotum; (7) ventral thorax, mesosternum with 1 pair dark brown spots near midlegs; (8) 3 pairs of legs slender, forelegs small, midlegs and hindlegs progressively larger; (9) abdomen longer than wide; (10) dorsal abdomen deep dark brown colour, 2nd segment slightly narrower than thoracic pteronotum and with 2 divided tergite plates; (11) tergites on dorsal abdominal segments 3 to 6 slightly narrow in middle on both male and female; (12) male with 2 divided tergite plates laterally and female with 1 tergite on each dorsal abdominal segment of 7 and 8; (13) ventral abdomen with 9 segments and in white colour, 1 pair of dark brown spots laterally on each segment of 2 to 9; (14) female with 1 small dark brown spot in inverted triangle shape in middle of ventral abdominal segments 8 and 9.

Genome organization

A total of 3.7 Gb of data (about 20-fold coverage) was obtained from the Illumina HiSeq 2500 platform, raw sequencing of 15,178,382 paired reads. Reads were cleaned (9,630,532 clean pairs) and assembled. The longest assembled contig was 16,659 bp in size and was the complete mt genome of F. suturalis (GenBank accession: MW696813). All 37 mt genes typical of metazoan mt genomes were present. Gene arrangement was distinct from that of F. quadripustulatus [11] and other philopterids [15]. Overall nucleotide composition was: A = 27.8%, T = 44.8%, C = 11.1%, G = 16.3%. All mt genes were encoded on the heavy strand, except tRNA-Arg. Three overlapping regions in the mt genome were observed: nad4L/nad1, tRNA-His/tRNA-Asp and tRNA-Asp/tRNA-Arg, ranging from 4 to 8 bp overlaps (Table 1). In additional, 22 intergenic regions were observed, ranging from 1 to 180 bp in size. The longest spacer was between tRNA-S2 and tRNA-G (Table 1).

Table 1 The organization of the mt genome of Falcolipeurus suturalis

Total A + T and G + C content of the complete mt genome was 73.0% and 27.0%, respectively, consistent with the nucleotide content of lice reported in previous studies [11, 15] (Table 2). Negative AT skew (−23.3) and positive GC skew (18.9) were found (Table 2), consistent with other louse mt genomes [11, 15]. All bird lice from the Philopteridae reported to date have strong strand asymmetry (GC skew between 6.3% and 38.1%) (Table 2).

Table 2 Nucleotide composition of the mt genomes of Philopteridae species, including that of Falcolipeurus suturalis

Annotation

As the mt genomes of parasitic lice can contain non-standard initiation codons [1, 5, 13], their identification can be challenging. In F. suturalis, all protein-coding genes used ATNs (ATA, ATG, ATT) or TTG as their initiation codons. Four genes (cox1, atp6, cox3, and cox2) use ATA, five (nad6, nad2, nad5, atp8, and nad1) use ATG, three (nad3, nad4L, and nad4) use ATT, and one (cytb) uses TTG (Table 1). All protein-coding genes had standard (TAA or TAG) or partial (TA or T) termination codons (Table 1). Eight genes (cox1, atp6, nad6, cox3, nad2, cox2, atp8, and nad4L) use TAA, three genes (nad4, nad3, and cytb) use TAG, nad1 uses TA, and nad5 genes use T (Table 1). Incomplete termination codons (TA or T) were found for nad1 and nad5 genes, which has been found in other bird lice, including B. macrocnemis (nad1) and F. quadripustulatus (nad5, nad6 and nad1). In the F. suturalis mt genome, rrnL was located between rrnS and tRNA-Ala, while rrnS was between tRNA-Asn and rrnL (Fig. 2; Table 1). The lengths of rrnS and rrnL were 726 bp and 1075 bp, respectively. tRNA genes varied from 61 to 71 bp in length (Table 1). All 22 tRNA genes can be folded into the canonical cloverleaf structure (Fig. 3), consistent with previous studies [31, 32]. Apart from the coding regions, we identified three non-coding regions. Non-coding region (NC1) (829 bp), located between nad4 and tRNA-LeuUUR, has the highest A + T content of 75.4% while non-coding region (NC2) (722 bp; A + T = 74.4%) is located between cox2 and nad5 and non-coding region (NC3) (425 bp; A + T = 75.1%) is located between tRNA-Gly and nad3 (Table 1).

Fig. 2
figure 2

The mt genome of Falcolipeurus suturalis. All genes are on the same DNA strand and are transcribed clockwise, except for tRNA-Arg (R). Protein-coding and rRNA genes are indicated with the standard nomenclature. tRNA genes are indicated with the one-letter code of their corresponding amino acids. There are two tRNA genes for leucine: L1 for codons CUN and L2 for UUR; and two tRNA genes for serine: S1 for codons AGN and S2 for UCN. “NCR1” refers to the first non-coding region. “NCR2” refers to the second non-coding region. “NCR3” refers to the third non-coding region

Fig. 3
figure 3

22 tRNA secondary structures from Falcolipeurus suturalis

Comparative analyses between F. suturalis and F. quadripustulatus

The entire mt genome of F. suturalis is 537 bp longer than that of F. quadripustulatus [11]. A comparison of nucleotide and amino acid sequences in each protein-coding gene of the two Falcolipeurus species is given in Table 3. Nucleotide sequence differences across the entire mt genome was 31.4%. The magnitude of nucleotide sequence variation in each gene between F. suturalis and F. quadripustulatus ranged from 13.2 to 27.5%. The greatest variation was observed in atp8 (27.5%), and the least difference was found in cytb (13.2%) (Table 3). For rrnL and rrnS, sequence difference is 28.4% and 14.6%, respectively, between F. suturalis and F. quadripustulatus (Table 3). Amino acid sequences inferred from individual mt protein genes of F. suturalis were also compared with those of F. quadripustulatus. Amino acid sequence differences ranged from 4.5 to 41.2%, with cox1 being the most conserved protein, and atp8 the least conserved (Table 3). Our results are consistent with other species-level comparisons in lice. For example, amino acid divergence in the 13 protein-coding genes of C. picui and C. cruziana was 5.5–50% [16], while between C. bidentatus compar and C. compar it was 0–37.3% [11, 14]. Taken together, the molecular evidence presented here supports that F. suturalis and F. quadripustulatus represent distinct louse species.

Table 3 Nucleotide (nt) and/or predicted amino acid (aa) sequence differences in mitochondrial genes among Falcolipeurus quadripustulatus (FQ) and Falcolipeurus suturalis (FS) upon pairwise comparison

Gene rearrangement

The mt genome arrangements of both Falcolipeurus species differ substantially from those of other species in the Philopteridae and from the putative ancestral insect (Fig. 4). Compared with the putative ancestral insect, no mt gene arrangements are shared with the mt genomes of Falcolipeurus species, as all genes are moved and/or inverted relative to their ancestral positions (Fig. 4). Of the 13 protein-coding genes, four (nad5, nad4, nad4L, and nad1) are inverted in both Falcolipeurus species relative to the putative ancestral insect. Only two gene blocks are shared between Bothriometopus and the putative ancestral insect: tRNA-Gly–nad3 and atp8–atp6 [13], while one gene block is shared between Campanulotes species and the ancestral insect: atp8-atp6 [11, 14, 33]. Three gene blocks, tRNA-Val–cox3, tRNA-Tyr–cox2, and tRNA-LeuCUNnad4L, are shared between Falcolipeurus and Ibidoecus [11]. Two gene blocks, nad4–tRNA-LeuUUR and tRNA-Gly–nad3, are shared between Falcolipeurus and Campanulotes. In the Philopteridae, only one gene block, tRNA-Ile–cox1, is shared across all Philopterids, excepting I. bisignatus, Columbina, and Columbicola. Such a lack of conserved gene arrangements in the mt genome of bird lice complicates the accurate reconstruction and identification of rearrangement events across their history [13].

Fig. 4
figure 4

Gene rearrangements of mitochondrial genomes of bird lice within the family Philopteridae

Usually, gene arrangement in mt genomes is very conserved between congeneric lice [11, 14]. Gene rearrangements between F. suturalis and F. quadripustulatus were identified consisting of at least one translocation (Fig. 5). The nad3 gene is located between cox2 and tRNA-Thr in F. quadripustulatus, but between tRNA-Gly and tRNA-LeuCUN in F. suturalis (Fig. 5). This gene rearrangement between the mt genomes of two Falcolipeurus species indicated that the rate of rearrangements of mt genes may vary substantially among closely related groups of lice [38]. It is interesting that two congeneric Falcolipeurus species differ by a rearrangement. Although this pattern has been found in Columbicola [16] and multiple Anoplura species [11], it is the first time that it has been found in a louse without fragmented genomes.

Fig. 5
figure 5

Gene rearrangement in two Falcolipeurus species

One tRNA gene (tRNA-Gly) was lacking, while the duplication of three genes (tRNA-Thr, tRNA-Tyr, and cox2) was found in the F. quadripustulatus mt genome [11]. However, all 37 genes have been identified in the F. suturalis mt genome. Gene duplications have also been reported in the mt genomes of several families in the class Insecta, such as the Reduviidae and Thripidae [39,40,41]. In addition, tRNA loss has been also found in the mt genome of several families of the class Insecta [11, 42].

Phylogenetic relationships

Phylogenetic analysis shows the genetic distinctiveness between F. suturalis and F. quadripustulatus (Bpp = 1.0; Bf = 100). The branch leading to the two Falcolipeurus species is much longer than the branch of two Columbina species. The genus Falcolipeurus is more closely related to the genus Ibidoecus than to other genera (Bpp = 1.0, Fig. 6; Bf = 100, Fig. 7), which was consistent with that of a previous study [11]. Ten species of the Philopteridae were included in this study. The Philopteridae was paraphyletic with strong support in Bayesian analysis (Bpp > 0.9, Fig. 6) and weak support in ML analysis (Bf > 17, Fig. 7).

Fig. 6
figure 6

Phylogenetic relationships among 10 species of the family Philopteridae inferred by Bayesian inference from deduced amino acid sequences of 12 protein-coding genes. One wallaby louse, Heterodoxus macropus as an outgroup. Bayesian posterior probabilities (Bpp) were indicated at nodes

Fig. 7
figure 7

Phylogenetic relationships among 10 species of the family Philopteridae inferred by Maximum likelihood from deduced amino acid sequences of 12 protein-coding genes. One wallaby louse, Heterodoxus macropus as an outgroup. Bootstrapping frequencies (Bf) were indicated at nodes

DNA sequencing provides the opportunity to further evaluate phylogenetic relationships in the Philopteridae. Phylogenetic relationships in the Philopteridae have been investigated with analyses of the nuclear gene sequences. For example, Cruickshank et al. analyzed elongation factor-1 alpha (EF-1α) sequences of 127 species from the four louse suborders, showing the Philopteridae to be paraphyletic [43]. Yoshizawa and Johnson analyzed mt 12S and 16S rDNA sequences of 18 species also showed the family to be paraphyletic [44]. However, Johnson et al. analyzed 1107 single-copy orthologous genes of 46 species and showed that the Philopteridae was monophyletic [45]. De Moya et al. analyzed 2370 orthologous genes and also showed that Philopteridae was monophyletic [46].

Mt genome sequences are effective molecular markers for the study of phylogenetic and systematic relationships at various taxonomic ranks across the phylum Arthropoda, including for ectoparasites [44,45,46,47,48]. Recently, a study has also indicted that mt genes can provide reliable reconstructions of evolutionary relationships in parasitic lice [6]. In the present study, the monophyly of the Philopteridae was rejected by mt genomic phylogenetic analyses. Song et al. inferred the high-level phylogeny of parasitic lice with the mt genome sequences of 25 species of parasitic lice, and showed that the Philopteridae was paraphyletic [11]. To date, the phylogenetic position of the Philopteridae within the Phthiraptera has not been confidently determined. Although mt genomic data has proven to be useful as genetic markers to explore the phylogenetic relationships among major lineages of parasitic lice [9, 11], mt genome sequences of many lineages are under- or not represented. Analyses of mt genome sequences in the current and previous studies has indicated that the Philopteridae was paraphyletic. However, many species of this family were not included across these studies [11]. Additional complete mt genomes of bird louse species representing multiple families should be included in future analysis to help resolve the phylogenetic position of the Philopteridae within the Phthiraptera.

Conclusions

The current study presents the entire mt genome of F. suturalis and compared it with the mt genome of F. quadripustulatus. Its gene order is rearranged relative to other Falcolipeurus species, and represents a new pattern within the Phthiraptera. These novel datasets will help to better understand the gene rearrangements and phylogenetic position of Falcolipeurus and provide useful genetic markers for systematics and phylogenetic studies of bird lice.