Background

Scolex type, external segmentation and internal proglottization are all important evolutionary characters of the Cestoda. The Amphilinidea and Gyrocotylidea (Cestodaria) that do not possess a scolex are early divergent lineages in this class. Tapeworms of the order Caryophyllidea (Platyhelminthes: Eucestoda) are typified by a monozoic body (neither internal proglottization nor external segmentation). The Spathebothriidea are polyzoic but externally unsegmented, and all other eucestodes demonstrate classic proglottization (segmented body parts each with a set of reproductive organs). Morphological analysis shows the Caryophyllidea to be the earliest divergent lineage of Eucestoda [1] although phylogenetic analysis based on LSU rDNA and SSU rDNA have indicated that the Spathebothriidea may be the earliest diverging eucestodes [2, 3]. However, recently, topology constructed using large fragments of mtDNA supports the Caryophyllidea as the most primitive eucestodes [4]. These results indicate the Caryophyllidea to be a key group for studying evolutionary relationships within the Eucestoda as well as with other parasitic Monogenea, Aspidogastrea and Digenea.

Owing to its maternal inheritance, a lack of recombination and a fast rate of evolution [5], the haploid mitochondrial genome has proven to be a useful marker for population studies, species identification and phylogenetics [6, 7]. Its genome-level characteristics, gene arrangements and the positions of mobile genetic elements also enable it to be a powerful tool for reconstructing evolutionary relationships [8,9,10]. Using gene sequences and gene arrangements from the complete mt genome, the phylogenies of some parasitic Platyhelminthes have been reconstructed [11,12,13]. However, due to a paucity of complete mt genomic information from these groups, very few parasitic flatworms have been included in these phylogenetic analyses. From the 16 orders of cestodes that exist, only four (Diphyllobothriidea, Bothriocephalidea, Proteocephalidea and Cyclophyllidea) are currently represented in the GenBank database, and as the ancestral taxa of the Eucestoda, no complete mitogenome from the Caryophyllidea has been sequenced.

Khawia sinensis Hsü, 1935, and Atractolytocestus huronensis Anthony, 1958, belong to the family Lytocestidae and are very common caryophyllideans in the intestine of the common carp (Cyprinus carpio). Both invasive tapeworms have a worldwide distribution and are translocated with the introduction of the common carp into countries around the world [14, 15]. Breviscolex orientalis Kulakovskaya, 1962, the only member of the family Capingentidae, is typically recorded in the cyprinids Hemibarbus barbus [16]. In addition, the Asian fish tapeworm Schyzocotyle acheilognathi (syn. Bothriocephalus acheilognathi), a segmented tapeworm of the Bothriocephallidea, is also an invasive parasite found worldwide.

This study has therefore generated the complete mitogenomes of three caryophyllideans, in addition to the Asian fish tapeworm in order to analyse the phylogenetic relationships of eucestodes and the differences in the gene arrangement between unsegmented and segmented eucestodes.

Methods

Specimen collection and DNA extraction

The following cestodes, K. sinensis and A. huronensis from the common carp (Cyprinus carpio), B. orientalis from Hemibarbus maculates and S. acheilognathi from the grass carp (Ctenopharyngodon idella), were collected from a fishery (29°59′10.47″N, 115°47′37″E) in Hubei Province, China. The parasites were preserved in 80% ethanol and stored at 4 °C. Specimens were stained with carmine and identified morphologically using the scolex and testis [16]. Total genomic DNA was extracted from the posterior region of a single tapeworm using a TIANamp Micro DNA Kit (Tiangen Biotech, Beijing, China), according to the manufacturer’s instructions. DNA was stored at -20 °C for subsequent molecular analysis. The morphological identification of specimens was verified by sequence analysis of the complete ITS1 rDNA region [17] and partial sequence of cox1 gene [18].

PCR and DNA sequencing

Partial sequences of the mtDNA from the four cestodes were initially amplified by PCR using degenerate primers (Additional file 1: Table S1). Using these fragments, specific primers were designed for subsequent PCR amplification (Additional file 1: Table S1). PCR reactions were conducted in a 20 μl reaction mixture, containing 7.4 μl molecular grade water, 10 μl 2 × PCR buffer (Mg2+, dNTP plus, Takara, Dalian, China), 0.6 μl of each primer, 0.4 μl rTaq polymerase (250 U/μl, Takara), and 1 μl DNA template. Amplification was performed under the following conditions: initial denaturation at 98 °C for 2 min, followed by 40 cycles at 98 °C for 10 s, 48–60 °C for 15 s, 68 °C for 1 min/kb, and a final extension at 68 °C for 10 min. PCR products were sequenced bidirectionally at Sangon Company (Shanghai, China) using the primer walking strategy.

Sequence analyses

The complete mt sequences were assembled manually and aligned against the mitogenome sequences of other published cestodes using the program MAFFT 7.149 [19] to determine the gene boundaries. Protein-coding genes (PCGs) were inferred with the help of BLASTX [20] and SeqBuilder module in the Lasergene7 software package (DNASTAR), employing the genetic code 9, the echinoderm and flatworm mitochondrial. The majority of tRNAs were identified by comparing the results of tRNAscan-SE [21], ARWEN [22], MITOs [23] and DOGMA [24]. However, tRNA Phe and tRNA Gln from B. orientalis and tRNA Gln from A. huronensis were visually compared with the sequences from other cestodes. The location of the two ribosomal RNA genes, rrnL and rrnS, were explored through alignment with other available mt cestodes sequences, and their ends were assumed to extend to the boundaries of their flanking genes. The 5′ end of the rrnL gene in S. acheilognathi however, was determined by the result of alignments. MitoTool [25], a home-made program, was primarily used to parse the annotated mt genome into a Word document format, and generate *.sqn file for GenBank submission and a *.csv file for Table 1. Mitotool was furthermore employed to unify the name of all 36 genes (12 PCGs, 2 rRNAs and 22 tRNAs) and locate all NCR positions (setting threshold of 50 bp) within the mitogenomes of the selected cestodes. Finally, the fasta file containing the nucleotide sequences and gene order for all 36 genes (12 PCGs, 2 rRNAs and 22 tRNAs) was extracted from the GenBank files, processed and used to generate Additional file 2: Table S2 and Additional file 3: Table S3. Repetitive regions within the NCRs were found using a local version of a Tandem Repeats Finder [26]. The alignments located in highly repetitive regions (HRRs) were shaded and labelled using TEXshade software [27]. The secondary structure of each consensus repeat unit was predicted by Mfold software [28], and codon usage and relative synonymous codon usage (RSCU) were computed with MEGA 5 [29]. CREx program [30] was then utilised to calculate the rearrangement events and to conduct pairwise comparisons of gene orders from all of the cestodes using common intervals measurement.

Table 1 The annotated mitochondrial genome of the four cestodes

Phylogenetic analyses

Phylogenetic analysis was carried out using the mitogenomes generated from the four cestodes as part of this study as well as those of the 50 cestodes available from GenBank (Additional file 2: Table S2). Two trematodes, Dicrocoelium chinensis (NC_025279) and Dicrocoelium dendriticum (NC_025280), were used as outgroups. Another program written in-house, BioSuite [31], was employed to align all of the genes in batches using integrated MAFFT, wherein codon-alignment mode was used for the 12 PCGs, and normal alignment mode for the remaining genes (2 rRNAs and 22 tRNAs). The alignments were then concatenated to generate well-supported Phylip and nexus format files for use in the phylogenetic analysis software. Both the maximum likelihood (ML) and Bayesian inference (BI) were used to reconstruct phylogenetic trees, and selection of the most appropriate evolutionary models for the dataset was carried out using ModelGenerator v0.8527 [32]. Based on the Akaike information criterion, GTR + I + G was chosen as the optimal model for nucleotide evolution. ML analysis was performed by RaxML GUI [33] using an ML + rapid bootstrap algorithm with 1000 replicates. BI analysis was performed in MrBayes 3.2.1 [34] with default settings and 1 × 107 Metropolis-coupled MCMC generations. The tree was then annotated using iTOL (a web-based tool) [35] with the help of several dataset files generated by MitoTool.

Results

Genome organisation and base composition

The mitogenomes of A. huronensis (GenBank accession number: KY486754), B. orientalis (KY486752), K. sinensis (KY486753) and S. acheilognathi (CN) (KX589243) are circular double-stranded DNA molecules. The size of these mitogenomes was 15,130 bp in A. huronensis, 14,620 bp in K. sinensis, 14,011 bp in B. orientalis, and 14,046 bp in S. acheilognathi (CN) (Fig. 1). The mitogenome of A. huronensis was the largest of all those available for cestodes (Additional file 2: Table S2, Fig. 2). The length of the S. acheilognathi (CN) mitogenome was about 140 bp longer than previously published due to the presence of a longer NCR between nad5 and cox3 [36]. Similar to other flatworm mitogenomes [11], which lacked the atp8 gene, and encoded all the genes on the same strand, all of those generated in this study contained the standard 36 elements: 12 PCGs (atp6, cytb, cox1–3, nad1–6 and nad4L), 22 tRNA genes and two rRNA genes (Fig. 1). Intriguingly, A-T content of the three Caryophyllidea species (K. sinensis, A. huronensis and B. orientalis) was the lowest of all published cestode mitogenomes (Fig. 2).

Fig. 1
figure 1

Map of the mitochondrial genomes of Atractolytocestus huronensis, Breviscolex orientalis, Khawia sinensis and Schyzocotyle acheilognathi (China, CN). The 12 protein-coding genes (PCGs), 22 tRNA and two rRNA genes are depicted as well as the non-coding regions (NCRs)

Fig. 2
figure 2

Maximum-likelihood tree inferred from 36 genes (12 protein-coding genes, 2 rRNAs and 22 tRNAs) of mitochondrial genomes of 54 cestode species from five orders, using two trematoda species as outgroups. Scale-bar represents the estimated number of substitutions per site. Bootstrap/posterior probability support values of ML/BI analysis are shown above the nodes. The bar graph (corresponding to tip labels in the tree) of the mitogenome length and A-T content are shown on the right of the tree

Protein-coding genes and codon usage

The size of the 12 PGCs ranged from 258 bp (nad4L) to 1554 bp (nad5) for the three caryophyllideans, but from 258 bp (nad4L) to 1584 bp (cox1) for S. acheilognathi (CN) (Additional file 3: Table S3). Only two types of start codons (ATG and GTG) were inferred from the sequence data of the four cestodes. GTG was used as a start codon for the following genes: nad2, nad3, cox2, nad5 and nad6 in A. huronensis, nad2, nad3, nad5 and nad6 in B. orientalis and nad4, nad4L in S. acheilognathi (CN). The rest of the PCGs of the aforementioned cestodes and all of the PCGs of K. sinensis used ATG as a start codon. From the three predicted stop codons, TAG, TAA and the abbreviated stop codon T, TAG was the most frequently occurring stop codon, followed by TAA and finally T. The unusual stop codon T encoded for cox3 in A. huronensis, B. orientalis and S. acheilognathi (CN) and cox2, cox3 and nad3 in K. sinensis (Table 1). RSCU for the four cestode mtDNAs calculated using the echinoderm mt genetic code are presented in Additional file 4: Figure S1. Overall, the three most commonly used T-rich codons for the three Caryophyllidea cestodes (A. huronensis, B. orientalis and K. sinensis) were Val (GTT), Leu (TTG) and Phe (TTT) compared with Tyr (TAT), Leu (TTG) and Phe (TTT) for S. acheilognathi (CN).

Transfer and ribosomal RNA genes

All 22 tRNAs from the mt genome of each Caryophyllidea species were concatenated. This created a total concatenated length of 1363 bp, 1378 bp, 1354 bp and 1404 bp for A. huronensis, B. orientalis, K. sinensis and S. acheilognathi (CN), respectively (Additional file 3: Table S3). Each tRNA identified from these four species, could be folded into the traditional cloverleaf structure, with the exception of tRNA Ser(AGN) and tRNA Arg in B. orientalis, K. sinensis and S. acheilognathi (CN) and tRNA Ser(AGN), tRNA Arg and tRNA Cys in A. huronensis, which all lacked DHU arms (Additional file 5: Figure S2). All tRNAs had the standard anti-codons found in flatworms (Table 1), except tRNA Ser(AGN) in K. sinensis which had an anti-codon of TCT. The two ribosomal RNA genes, rrnL and rrnS were flanked by tRNA Thr and cox2 and separated by tRNA Cys. This was identical in all the cestodes for which a mitogenome was available (Additional file 6: Figure S3). The boundary of the rrnL gene for S. acheilognathi (CN) was redefined, being approximately 100 bp shorter than that of previously published mitogenomes. This is due to the difference in defining the boundary (Additional file 7: Figure S4) [36]. Thus, there was an additional 124 bp NCR located between tRNA Thr and rrnL. Additionally, to conduct phylogenetic analysis and linear order comparison (see later), we proposed a reasonable tRNA Gln annotation to a recently reported mitogenome from Testudotaenia sp. WL-2016 (KU761587) based upon alignments with other cestodes.

Non-coding regions

The position of the NCR in all cestodes was identified with a threshold value of 50 bp. The majority of cestodes contained two NCRs, except for Pseudanoplocephala crawfordi [37], Taenia crocutae [38], Taenia solium [39] and S. acheilognathi (CN) all of which had three NCRs, and Hydatigera taeniaeformis which has just one NCR. These NCRs occurred in the junctions of rrnS-tRNA Arg (P1) and nad5-cox3 (P2) (Additional file 6: Figure S3). The length of the major NCRs were 873 bp (NCR1) and 1283 bp (NCR2) in A. huronensis, 549 bp (NCR1) and 1083 bp (NCR2) in K. sinensis, 208 bp (NCR1) and 825 bp (NCR2) in B. orientalis and 124 bp (NCR1), 166 bp (NCR2) and 499 bp (NCR3) in S. acheilognathi (CN). The concatenated size (2156 bp) of all NCRs from A. huronensis was the longest of all the cestodes (Additional file 3: Table S3). Various highly repetitive regions (HRRs) were detected in NCRs from the four cestode species, and the consensus repeats were capable of forming stem loop structures (Fig. 3).

Fig. 3
figure 3

Highly repetitive regions (HRRs) and their secondary structures of the consensus repeat units in the major non-coding regions (NCRs) of the mitochondrial genomes of Atractolytocestus huronensis (a), Khawia sinensis (b), Breviscolex orientalis (c) and Schyzocotyle acheilognathi (China, CN) (d). Thermodynamic value (dG) is shown under the secondary structure

Phylogeny and gene order

Both phylogenetic trees (BI and ML) demonstrated high statistical support for branch topology, especially on the order level (BP ≥ 85, BPP = 1). Since the two trees had the same topology, only the latter was shown (Fig. 2). The most derived Cyclophyllidea cestodes, together with the Proteocephalidea (represented by Testudotaenia sp. WL-2016), constitute a reciprocal monophyletic group with the Bothriocephalidea. This clade formed a sister-group to the Diphyllobothriidea, and all clades exhibited a sister-group relationship with the basal Caryophyllidea (Fig. 2). Breviscolex orientalis belonging to the family Capingentidae clustered into a well-supported clade with A. huronensis from the family Lytocestidae inferred by a maximum possible nodal support (BP = 100, BPP = 1) which formed a sister-group relationship with another Lytocestidae species, K. sinensis.

Amongst the 54 mitogenomes across the five orders, each order had a unique arrangement except for the Cyclophyllidea which had two types: group 1 (represented by the Taeniidae) was identical to the Diphyllobothriidea, and group 2 (represented by the Hymenolepididae, Anoplocephalidae, Dipylidiidae and Paruterinidae) was identical to the Proteocephalidea. These corresponded to four mt gene arrangement categories: I, Caryophyllidea; II, Diphyllobothriidea and group 1; III, Bothriocephalidea; IV, Proteocephalidea and group 2 (Fig. 4). Pairwise analysis between the four gene arrangement categories indicated similarities (common intervals algorithm) in the gene order between unsegmented and segmented cestodes to be lower than within segmented cestodes (Table 2).

Fig. 4
figure 4

Rearrangement events predicted by CREx to explain gene order changes among the four mitogenome arrangements categories, Caryophyllidea (I), Diphyllobothriidea and Cyclophyllidea group 2 (II), Bothriocephalidea (III), Proteocephalidea and Cyclophyllidea group 1 (IV). L1, tRNA Leu(CUN); L2, tRNA Leu(UUR), S2, tRNA Ser(UCN); E, tRNA Glu; Y, tRNA Tyr; TDRL, tandem-duplication-random-loss

Table 2 Pairwise comparisons of mitochondrial DNA gene orders among the four categories of mitogenome arrangements (see Fig. 4)

Discussion

In the phylogenetic analysis employed in this study, the Caryophyllidea was resolved as the sister taxon to all other eucestodes in line with previous studies. Although only five orders of cestodes are included in the phylogenetic analysis, the evolutionary relationships remain consistent with the results generated through morphological examination [1] and sequence data obtained from large fragments of mtDNA [4].

The mitogenome gene order of the cestodes was extremely conservative. Amongst the 54 mitogenomes across the five orders, only four gene arrangement categories were found. With respect to the three types of gene arrangements (II, III and IV) in the segmented cestodes, all the rearrangement operations are acted on the four closely linked tRNA genes (tRNA Leu(CUN)-tRNA Ser(UCN)-tRNA Leu(UUR)-tRNA Tyr) (Fig. 4). When compared with the category I in the unsegmented cestodes, there probably exists a long distance transposition event (the three tRNA genes tRNA Leu(CUN)-tRNA Ser(UCN)-tRNA Leu(UUR) translocate to the 3′ end of the four genes cox2-tRNA Glu-Nad6-tRNA Tyr) (Fig. 4), which may be the main cause of the low similarity value. According to the results of CREx program, the gene rearrangements from category II to category III and IV undergo a tandem-duplication-random-loss (TDRL) event and a simple transposition event, respectively. A TDRL event can provide directional information, allowing the inference of the ancestral state from the comparison of only two taxa because reversing the rearrangement would require more than a single operation [40]. Based on this assumption on TDRL event (Fig. 4), category II may be the ancestral state of the two categories II and III. Two categories of mt gene order were also found in the most derived Cyclophyllidea owing to the transposition of two tRNA genes [41]. However, the two types of gene arrangements are identical to those of the sister order Proteocephalidea and the relatively basal order Diphyllobothriidea.

There are perhaps more gene arrangements in other orders of cestodes; however, due to the limited amount of mitogenome data available so far, we can only but speculate. The rearrangement events that have been observed among the four arrangement categories in this study all took place in P1 as mentioned above (Fig. 4), revealing a rearrangement hot spot. Interestingly, P1 is furthermore the position in which one or two NCRs frequently occurred, and in which highly repetitive regions (HRRs) also are found within the NCRs. Whether an association exists between the rearrangement hot spot and the NCRs is something that requires further investigation to ascertain whether they may be important in the evolution of cestodes.

The phylogenetic relationship between B. orientalis and A. huronensis was found to be closer than that of A. huronensis and K. sinensis, which conflicts with classic systematics. On the basis of the paramuscular position of the vitelline follicles, B. orientalis is placed into the family Capingentidae Kulakovskaya, 1962, being the only member of this family found in the Palaearctic region. However, the fibres of the longitudinal musculature are situated mostly in the inner region of the vitelline field or entirely medullary to it, which is similar to the topography present in the Lytocestidae which possess cortically situated vitelline follicles [42]. Breviscolex orientalis has a cuneiform scolex, as do both species of Caryophyllaeides Nybelin, 1922 in the Lytocestidae [16]. These results suggest that the morphological characters of B. orientalis are closer to those of the Lytocestidae. Despite the similar result found in this study, relocation of B. orientalis, the only member of the family Capingentidae, into the family Lytocestidae, needs more molecular support.

Conclusions

Among the four arrangement categories, the rearrangement events are detected in P1 where the NCRs with highly repetitive regions (HRRs) are common. A putative long-distance transposition event is detected between the unsegmented and segmented cestodes. The TDRL event suggests that the mt gene arrangement of the Diphyllobothriidea is the ancestral state relative to Bothriocephalidea. Gene arrangements of the Taeniidae and the rest of the families in the Cyclophyllidea are found to be identical to those of the sister order Proteocephalidea and the relatively basal order Diphyllobothriidea, respectively.