Background

Classification and phylogeny of the Monogenea have been debated for decades without reaching a consensus. Therefore, several classification systems, based on different morphological characters, coexist (e.g. [1,2,3,4,5,6,7]). These often use inconsistent terminology; e.g. Monogenea vs Monogenoidea, Monopisthocotylea vs Polyonchoinea, etc. (for further details see Table 1 in Mollaret et al. [8]). To simplify comparison with previous studies and discussion of our results, we follow the naming system of Boeger & Kritsky [4], with four exceptions: we use order rank for Tetraonchidea (which was split into two parts by Boeger & Kritsky: Tetraonchidae and Amphibdellatidae were assigned to Dactylogyridea, and the remaining Bothitrematidae and Tetraonchoididae were assigned to Gyrodactylidea), Monogenea instead of Monogenoidea, Monopisthocotylea instead of Polyonchoinea, and Polyopisthocotylea instead of Oligonchoinea.

Table 1 Annotated mitochondrial genome of Paratetraonchoides inermis

Paratetraonchoides inermis Bychowsky, Gussev & Nagibina, 1965 (Monogenea: Tetraonchoididae) is a parasitic flatworm usually found on gills of fishes belonging to the family Uranoscopidae. It was originally assigned to the Tetraonchoididae (Tetraonchidea) based on several morphological characteristics: a considerably extended ribbon-shaped body, absence of eyes and middle hooks, and a digestive system typical for this family [9]. However, as the parasite uses 16 marginal gyrodactylid-type (hinged) edge hooks to attach itself to the gills of its host, later it was reassigned to Gyrodactylidea by Boeger & Kritsky [4]. Concerning relationships among the orders Dactylogyridea, Tetraonchidea, Capsalidea and Gyrodactylidea (DTCG group henceforth), Bychowsky [1] argued that the ancestors of the Tetraonchidea were morphologically closer to the Gyrodactylidea than to the Dactylogyridea (note that capsalids were assigned to the Dactylogyridea in this classification). In contrast, in taxonomic classification presented by Lebedev [2] includes the Tetraonchidea, together with the Dactylogyridea and Capsalidea, into the superorder Dactylogyria, whereas the Gyrodactylidea was elevated to the superorder level (Gyrodactylia). This classification was partly supported by Justine [5], based on spermatology data: sperm pattern four was found in the orders Tetraonchidea and Dactylogyridea, whereas sperm patterns 2a and 2c were found in the Capsalidea and Gyrodactylidea, respectively. The fact that these traits produce incongruent results, and that several decades of research have failed to produce a consensus on the phylogeny of DTCG group, presents a strong indication that morphological datasets may not provide a sufficiently robust method to establish a comprehensive phylogenetic hypothesis for DTCG group and the Monogenea.

With rapid advances in sequencing techniques, accompanied by exponentially increasing the availability of molecular data, molecular phylogenetics is becoming the tool of choice for resolving the evolutionary relationships within this group of animals [10, 11]. Despite this, the availability of molecular data for DTCG remains limited and unbalanced (some orders are underrepresented) to infer their phylogenetic relationships with high resolution. For example, currently (July 2017), only 12 molecular records are available in the GenBank database for the Tetraonchidea. The only molecular marker used so far to study the phylogenetics of DTCG was 18S ribosomal RNA [11]. This marker produced a topology somewhat similar to Justine’s results [5]: (Dactylogyridea + Tetraonchidea) + (Monocotylidea + (Capsalidea + Gyrodactylidea)). However, as single-gene markers may not carry a sufficient amount of information to provide high phylogenetic resolution, future studies shall probably increasingly rely on much more powerful phylogenomic approaches [12, 13].

Owing to the abundance of mitochondria in animal tissues, maternal inheritance, the absence of introns, the small size of genomes in metazoans, and an increasingly large set of available orthologous sequences, metazoan mitochondrial (mt) genomes have become a popular tool in population genetics [14], phylogenetics [15, 16] and diagnostics [12]. Comparisons of the arrangements of genes have garnered much scientific attention, as they carry a wealth of complementary resources with potential applications in molecular systematics [17, 18]. Although gene order within neodermatan (i.e. parasitic flatworms) mt genomes is assumed to be remarkably conserved [16, 19, 20], several exceptions have been reported, including African/Indian schistosomes [19, 21], polyopisthocotylids [16, 22, 23] and a single monopisthocotylid [24].

Here, we sequenced, annotated and characterized the first tetraonchid mt genome sequence, belonging to P. inermis. We compared it structurally to all available (July 2017) neodermatan mt genomes and used all available monogenean mt genome sequences to reconstruct the phylogenetic relationships within the entire class.

Methods

Specimen collection and DNA extraction

Monogeneans were collected on 27th September 2016 from the gills of Ichthyscopus lebeck (Bloch & Schneider, 1801) obtained from Dong-He market in Zhoushan, Zhejiang Province, China (29°56′–40°62′N; 122°18′–12°30′E). Paratetraonchoides inermis was identified morphologically by the hard parts of the haptor (dorsal and ventral connective bars, marginal hooks) and reproductive organs (male copulatory organ and vaginal armament) [9] under a stereomicroscope and a light microscope. The parasites were preserved in 99% ethanol and stored at 4 °C. Total genomic DNA was extracted from about 120 entire parasites using the TIANamp Micro DNA Kit (Tiangen Biotech, Beijing, China), according to the manufacturer’s recommended protocol, and stored at -20 °C.

Amplification and sequencing

Conducted as described before [25], with minor modifications: partial sequences of nad5, nad1, cox1, cox3, cox2 and rrnS genes were amplified by PCR using six degenerate primer pairs (Additional file 1). Based on these newly sequenced fragments, we designed specific primers for amplification and sequencing of the whole mitogenome (Additional file 1: Table S1). PCR was performed in a 20 μl reaction mixture, containing 7.4 μl dd H2O, 10 μl 2× PCR buffer (Mg2+, dNTP plus, Takara, Dalian, China), 0.6 μl of each primer, 0.4 μl rTaq polymerase (250 U, Takara), and 1 μl of DNA template. Amplification was conducted under the following conditions: initial denaturation at 98 °C for 2 min, followed by 40 cycles at 98 °C for 10 s, 48–60 °C for 15 s, 68 °C for 1 min/kb, and a final extension at 68 °C for 10 min. PCR products were sequenced bi-directionally at Sangon Company (Shanghai, China) using the primer-walking strategy.

Sequence annotation and analyses

After BLASTn analysis [26], the mitochondrial genomic sequence was assembled manually in a stepwise manner. The mt genome was aligned against the mt genomic sequences of other published monogeneans using MAFFT 7.149 [27] to determine approximate gene boundaries. The annotation was further fine-tuned using Geneious [28] adopting one capsalid mt genome, Neobenedenia melleni (MacCallum, 1927) (JQ038228) as the reference, and finally recorded in a Word document. Protein-coding genes (PCGs) were found by searching for ORFs (employing genetic code 9, echinoderm mitochondrial) and checking nucleotide alignments against the reference genome in Geneious. All tRNAs were identified using ARWEN [29], DOGMA [30], and MITOS [31] web servers. The two rRNAs, rrnL and rrnS, were also preliminarily found using MITOS, and their precise boundaries determined by alignment with closely related orthologs in Geneious. The NCBI submission file and tables with statistics for mt genomes were created using a home-made GUI-based program, MitoTool [32]. A nucleotide composition table was then used to make the broken line graph of skewness and A + T content in ggplot2 [33]. Codon usage and relative synonymous codon usage (RSCU) for 12 protein-encoding genes (PCGs) of seven analyzed monopisthocotylids were initially computed with MEGA 5 [34], then further sorted using custom-made Python scripts [35], and finally imported into ggplot2 to draw the RSCU figure. Rearrangement events in studied mt genomes and pairwise comparisons of gene orders of seven monogeneans were analyzed with CREx web tool [36] using the common interval measurement.

Phylogenetic analyses

Phylogenetic analyses were conducted using amino acid sequences of PCGs of the newly sequenced mt genome (P. inermis) and all 17 monogenean mt genomes available in the GenBank (Additional file 2: Table S2). Two species of the order Tricladida, Crenobia alpina (Dana, 1766) (KP208776) and Obama sp. MAP-2014 (NC_026978), were used as outgroups, as suggested in our previous study [25]. A fasta file with nucleotide sequences for all 12 PCGs was extracted from GenBank files and translated into amino acid sequences (employing genetic code 9, echinoderm mitochondrial) using MitoTool. All genes were aligned in batches with MAFFT, integrated into another GUI-based program written by us, BioSuite [37]. BioSuite was also used to concatenate these alignments and generate phylip and nexus format files. Phylogenetic analyses were conducted using maximum likelihood (ML) and Bayesian inference (BI) methods. Selection of the most appropriate evolutionary model for the dataset was carried out using ProtTest [38]. Based on the Akaike information criterion, MtArt + I + G + F was chosen as the optimal model for ML analysis, and LG + I + G + F for the BI analysis. ML analysis was performed in RaxML GUI [39] using a ML + rapid bootstrap (BP) algorithm with 1000 replicates. BI analysis was performed in MrBayes 3.2.6 [40] with default settings, and 5 × 106 metropolis-coupled MCMC generations. Finally, phylograms and gene orders were visualized and annotated by iTOL [41] with the help of several dataset files generated by MitoTool.

Results and discussion

Genome organization and base composition

The circular mitochondrial genome of P. inermis is 14,654 bp in size (GenBank accession number KY856918). The mitogenome is comprised of 12 protein-encoding genes (PCGs), 22 tRNA genes, two rRNA genes, and a major non-coding region (mNCR) (Fig. 1). The genome lacks the Atp8 gene, which is common for flatworms [42]. All genes are transcribed from the same strand. Six overlaps and 22 intergenic regions were found in the genome (Table 1). Noteworthy, the A + T content of the whole genome (82.6%), concatenated PCGs (81.9%), concatenated rRNA genes (81.1%), concatenated tRNA genes (82.6%), and even individual elements (three codon positions, two rRNAs, all 12 individual PCGs) of the genome, are the highest reported among the monogenean mitogenomes characterized so far (Additional file 2: Table S2; Fig. 2). Nucleotide skewness of the mt genome (as well as its elements) did not differ from other monogeneans (Fig. 2).

Fig. 1
figure 1

Visual representation of the circular mitochondrial genome of Paratetraonchoides inermis. Protein-coding genes (12) are red, tRNAs (22) are yellow, rRNAs (2) are green, and non-coding regions are grey

Fig. 2
figure 2

A + T content and skewness of individual elements and the complete genome. Species are coloured according to their taxonomic placement at the order level

Protein-coding genes and codon usage

Coalesced PCGs were 9996 bp in size, with a notably high A + T content of 81.9%. This was also reflected in individual PCGs: from 75.1% (cox1) to 87.9% (nad2 and nad3) (Table 2). Apart from cox1 (which used GTG), ATG was the initial codon for all other PCGs. Among the terminal codons, 11 out of 12 were TAA, whereas nad4L used TAG. No abbreviated stop codons (T--) were found (Table 1).

Table 2 Nucleotide composition and skewness of different elements of the studied mitochondrial genome

Codon usage, RSCU, and codon family proportion (corresponding to the amino acids usage) were investigated among seven monopisthocotylid representatives (Fig. 3). Except for Tetrancistrum nebulosi (Young, 1967), the most abundant codon families were Leu2, Ile, and Phe. This is comparable to Lepidoptera [43] and Nemertea [44]. Noteworthy, the studied mt genome exhibited a strong preference for the A + T-rich members of these four codon families (>10%, Phe, Ile, Leu2 and Asn in Fig. 3), whereas three codons mainly composed of G + C (CGC, GCG and CUG) were not found at all. This all corresponds well to the exceptionally high A + T bias of this mt genome. Overall, A + T-rich codons were favored over synonymous codons with lower A + T content among all seven considered monopisthocotylids (Fig. 3). This A + T preference is notably exemplified by the Leu2 family (TTA and TTG), where the TTA codon accounted for 86.92 ± 4.64%.

Fig. 3
figure 3

Relative synonymous codon usage (RSCU) of seven monopisthocotylid representatives. Codon families are labelled on the x-axis. Values on the top of the bars denote amino acid usage

Transfer RNA genes

All 22 standard tRNAs were found (Table 1), and most of them exhibited the conventional cloverleaf structure. Exceptions were trnS1 (AGN) and trnC, which lacked DHU arms. The unorthodox trnS1 (AGN) is commonplace among all sequenced monogeneans [12, 15, 16, 22,23,24, 45,46,47,48,49,50,51], and possibly even all flatworms [42]. The unpaired DHU-arm in tRNA cys was also reported in most monogeneans apart from Pseudochauhanea macrorchis, M. sebastis, Polylabris halichoeres (Wang & Yang, 1998), T. nebulosi and N. melleni [16, 22, 23, 45, 52]. A slight preference for the A nucleotide (AT skewness, 0.013) was found in concatenated tRNAs of the P. inermis mt genome, which is an exclusive feature among the analyzed monogeneans (Fig. 2), all of which exhibit a preference for the T nucleotide.

Non-coding regions

The major non-coding region (mNCR), 1201 bp in size and located between nad5 and cox3, had a slightly higher A + T content (86%) than other parts of the genome (Table 2). Within the mNCR, there were two minor repetitive regions, both consisting of two repeats, 19 and 16 bp in size. Three tRNA-like cloverleaf structures were found in the mNCR (Additional file 3: Figure S1), among which trnS1-like and trnL1-like sequences contained modified standard anti-codons (ACT and AAG respectively), whereas trnS2-like had a standard TAA anticodon. Average sequence similarity values between the three tRNA-like pseudo-genes and the corresponding functional monogenean tRNA homologs were low (41.45 ± 4.61% trnS1-like, 37.14 ± 3.84% trnL1-like, and 40.32 ± 6.01% trnS2-like), which indicates that they may not be functional. Such tRNA-like sequences were also observed in mNCRs of many lepidopteran insects [53, 54]. These three tRNA-like genes could be a remnant of the tandem-duplication-random-loss (TDRL) process, and the associated heightened rates of substitutions and indels in duplicated genes. A similar hypothesis was put forward by Cameron [55] with regard to the presence of tRNA-like sequences in the mNCRs of many lepidopteran insects [53, 54]. However, due to the limited data we have at disposal, functionality and presence of such tRNA-like sequences in other closely related species of the Tetraonchidea and other monogeneans remain speculative.

Phylogeny

Both phylogenetic analysis methods (BI and ML) produced phylograms with concordant branch topologies and high statistical support: all bootstrap support values were ≥ 88, and all Bayesian posterior probabilities were 1.0. Since both phylograms had the same topology, only the latter is shown (Fig. 4). Tree topology indicates the existence of two major clades: subclass Monopisthocotylea (Gyrodactylidea, Capsalidea, Tetraonchidea and Dactylogyridea) and subclass Polyopisthocotylea (Mazocraeidea). The Monopisthocotylea clade was further sub-divided into two clades, (Tetraonchidea + (Dactylogyridea + Capsalidea)) and Gyrodactylidea, both robustly supported (BP/BPP = 88/1 and 100/1, respectively).

Fig. 4
figure 4

Phylogeny of the five orders inferred using concatenated amino acid sequences of 12 protein-coding genes. Scale-bar represents the estimated number of substitutions per site. Star symbol indicates that both methods produced the maximum statistical support value (BP = 100, BPP = 1.0), elsewhere both values are shown above the node as ML/BI. The number and distribution of hooks, number of anchors and spermatozoon patterns for the five orders are given to the right of the figure. The 14/12 + 2 in the first column refer to 14/12 marginal hooks +2 central hooks

Based on the topology obtained in our phylogenetic analysis, the order Tetraonchidea appears to be much more closely related to the Dactylogyridea + Capsalidea clade than to the order Gyrodactylidea. This result is not fully congruent with any of the previously proposed classifications [1,2,3,4,5, 56]. As the entire order Tetraonchidea was represented by a single species of the Tetraonchoididae (P. inermis) in our study, this topology should be interpreted with some caution. This relationship appears to be supported by spermiogenetic and spermatozoal ultrastructural characters [5] (Tetraonchidea and Dactylogyridea possess sperm pattern 4, exhibiting one axoneme; Fig. 4), but as the Gyrodactylidea and Capsalidea both possess two axonemes (patterns 2a and 2c), we can conclude that sperm morphology and mitochondrial phylogenomics produce incongruent signals. Our results also appear to reject the validity of taxonomic grouping of the Dactylogyridea, Tetraonchidea and Gyrodactylidea by the possession of 16 gyrodactylid-type marginal hooks, proposed by Bychowsky [1]. Similarly, the Tetraonchoididae were reassigned to the Gyrodactylidea by Boeger & Kritsky [4] mainly by hinged ‘gyrodactylid’ hooks, which is also incongruent with our results. Our results also do not support the basal phylogenetic position of the Capsalidea proposed by Boeger & Kritsky [4]. Finally, the results are also in disagreement with phylogenetic classifications based on molecular data: 18S ribosomal RNA sequences produced a topology in which capsalids were phylogenetically closer to gyrodactylids than dactylogyrids [11].

Although our phylogenetic framework failed to reach a consensus with any of the previous studies, either those based on morphological or on molecular data, it provides important new insights into the evolutionary history of the four monogenean orders, the Gyrodactylidea, Dactylogyridea, Capsalidea and Tetraonchidea. Morphological traits are often believed to exhibit a high frequency of homoplasy, especially in (parasitic) microscopic animals, whether as a consequence of subjective, or merely simplistic, definitions of a character state (artifact) [57], or of a convergent evolution caused by similar selection pressures on different taxonomic groups [58]. The existence of numerous incompatible phylogenetic hypotheses regarding DTCG group and the entire class Monogenea [1,2,3,4,5,6,7, 56] presents excellent proof that wrong conclusions are often reached when poorly-chosen or numerically insufficient morphological characters are invoked. Although molecular phylogenetics is a promising tool to address this issue [10, 11], future studies should rely on molecular markers that carry a sufficient amount of information to provide high phylogenetic resolution [12, 13].

Gene order

Paratetraonchoides inermis exhibits an extensive reorganization of tRNAs in comparison to all other sequenced monogenean mt genomes (Fig. 5). However, disregarding the tRNA genes, the order of PCGs and rRNA genes within its mt genome conforms to the common neodermatan pattern [20, 59]. A gene order similarity matrix (Additional file 4: Table S3) based on all 36 genes also indicates that P. inermis was the most dissimilar among the compared monopisthocotylids, even including the unique order-possessing A. forficulatus [24]. The transformational pathway from P. inermis to the most similar gene arrangement, found in T. nebulosi and Paragyrodactylus variegatus (You, King, Ye & Cone, 2014), required two coupled transposition events, as well as three coupled long-range rearrangement operations, of which two were a TDRL, and one was a transposition (Additional file 5: Figure S2). Disregarding the two “non-standard” mt genomes, P. inermis and A. forficulatus, the remaining monopisthocotylids exhibit a remarkably conserved gene order [20, 23].

Fig. 5
figure 5

The 21 unique gene orders in neodermatan mitochondrial genomes filtered from 107 species. Representatives and the corresponding taxonomic category at the class/subclass level are shown on the left; star symbol denotes that the gene order is shared by Monogenea and Cestoda. Pattern types used here to classify gene orders are indicated on the right. Also, see Additional file 6: Figure S3)

For a better comparison of gene order among neodermatans, we used MitoTool and iTOL to extract and visualize all available sequences for species of the Cestoda and Trematoda (Additional file 6: Figure S3) and filter out all non-unique gene orders. This resulted in a set of 21 unique gene orders: 11 for the Monogenea, four for the Cestoda (also see our recent publication [60]) and seven for the Trematoda (note that the eleventh gene order in Fig. 5 was shared by the Cestoda and Monogenea). This indicates an exceptional plasticity in the mitochondrial gene order in the Monogenea, as they merely represent 16.8% (18/107) of all available neodermatan mt genomes, but account for 52.4% (11/21) of all unique gene orders. In general, gene order within neodermatan mt genomes is relatively conserved: all of the Cestoda [60] and a majority of the Trematoda mt genomes exhibited only minor variations in the tRNA order (Fig. 5; pattern 1a). Exceptions were only the African/Indian schistosomes (pattern 2, with interchange of two gene blocks: nad5-cox3-cytb-nad4L-nad4 and atp6-nad2, and interchange of nad1 and nad3; Fig. 5) and the monogenean subtaxon Polyopisthocotylea (pattern 3). This was also observed by Webster et al. [19], but they did not have the latest two sequenced monopisthocotylid mitogenomes (P. inermis and A. forficulatus) at their disposal. The majority of Monopisthocotylidea species indeed do exhibit the most common gene order pattern 1a, but these two mt genomes both exhibit extensively altered gene orders: A. forficulatus exhibits pattern 4, with a transposal of an entire gene block [24]; and the newly sequenced P. inermis is the sole representative of the pattern 1b, exhibiting an extensive reshuffling of tRNA genes (Fig. 5).

We hypothesise that the most common pattern (1a) might be the primitive gene order (plesiomorphy) from which patterns 1b, 2, 3 and 4 were derived. This hypothesis is in agreement with previous studies [19, 21, 61], which considered the gene order of Asian schistosomatid mt genomes as the ancestral state, and the gene order of African/Indian schistosomatids a derived trait. Two non-standard gene patterns (4 and 1b) contradict the two hypotheses regarding the putative monogenean plesiomorphic gene arrangement atp6-nad2-trnV-trnA-trnD-nad1-trnN-trnP-trnI [23], and the discriminating markers between Monopisthocotylea and Polyopisthocotylea: rrnL-trnC-rrnS and trnN-trnP-trnI-trnK [48] (Table 3).

Table 3 Occurrence of gene blocks in the five proposed gene order patterns

Such rare gene arrangements are believed to be a promising tool for molecular systematics and phylogenetic reconstruction because mitochondrial gene order reversal events are very rare, and unique orders rarely occur independently in separate lineages [18]. However, the latest two sequenced mt genomes (P. inermis and A. forficulatus) show that monopisthocotylids do not possess a synapomorphic gene order, and instead suggest that gene order within this group may be evolving at uneven rates. This can create misleading evolutionary signals, which was observed before in some taxonomic groups [62,63,64,65]. Thus, while taking into consideration that our insight is curbed by the sparsity of available mt genomes, this finding provides a strong note of caution to researchers wishing to use gene order rearrangements as a tool for neodermatan phylogeny.

Conclusions

Despite the limited availability of molecular data, our analysis provides three findings particularly worth noting. Firstly, there is no support for the sister-group relationship between the Gyrodactylidea and Tetraonchidea [1], nor for the allocation of the family Tetraonchoididae to Gyrodactylidea [4]. Instead, the Tetraonchidea exhibits a phylogenetic affinity with the Dactylogyridea + Capsalidea clade, which indirectly supports Lebedev’s traditional classification [2]. Secondly, the order Capsalidea is neither basal within the subclass Monopisthocotylea [4], nor forms a sister group with the Gyrodactylidea [10, 11]. Instead, it forms a sister clade with the Dactylogyridea, which is in full agreement with the two latest mitochondrial phylogenomic studies [24, 25] and lends further support to the traditional classifications by Bychowsky [1] and Lebedev [2]. Thirdly, the mitogenome of P. inermis provides several interesting findings from the genomic perspective as well: the unprecedentedly high A + T content of the entire genome and its elements, three tRNA-like sequences found in the mNCR, and a unique gene order. The latter indicates that gene order within monopisthocotylids may be evolving at uneven rates, thus creating misleading evolutionary signals. Heightened AT bias can confound phylogenetic inference [66] and the inclusion of only a handful of representatives for three orders (one for Tetraonchidea, two for Dactylogyridea and three for Capsalidea) in our analyses severely limits the phylogenetic resolution. Therefore, we are currently not able to generate a comprehensive phylogenetic hypothesis for the high-level phylogeny of Monopisthocotylea subclass, nor to conduct tests rigorous enough to be able to reject/accept with confidence the hypotheses put forward by the previous studies. Denser sampling and use of strategies alleviating potential compositional biases are needed to evaluate our phylogenetic results and resolve the phylogeny of monogeneans. Our work offsets the scarcity of molecular data for the order Tetraonchidea to some extent, providing a base both for the future fragmentary dataset studies (morphology data and single gene sequence-based molecular markers), as well as the future mitochondrial phylogenomics studies.