Introduction

Acoel flatworms are small, soft-bodied, unsegmented, marine animals lacking a gut epithelium, coelomic cavity, and anus. Instead, they typically possess a ventral mouth opening, and a simple syncytial digestive system1. Due primarily to the common attributes of acoelomate body and the absence of a through gut, Acoela were traditionally grouped as an order within the Platyhelminthes. The first molecular systematic studies on these animals using small subunit (SSU) ribosomal RNA gene sequences revealed that the Acoelomorpha are in fact a distinct lineage, quite separate from the main clade of the Platyhelminthes (Rhabditophora and Catenulida)2,3,4. Instead, these initial molecular studies supported a position of the Acoelomorpha diverging prior to the protostome/deuterostome common ancestor. More recently, the Acoelomorpha have been linked to the similarly simple marine worm Xenoturbella in the new phylum Xenacoelomorpha, making sense of their shared simple body plan and other shared morphological characters, such as unusual ciliary ultrastructure5 and their simple basiepidermal nervous system6. Despite considerable efforts, the position of Xenacoelomorpha within the Metazoa remains unresolved, with alternative lines of evidence placing them either as the sister group to the remaining Bilateria (protostomes and deuterostomes)7, 8, or as a phylum within the deuterostomes9. A better understanding of acoel phylogeny and evolution is therefore integral to answering central questions concerning the evolution of Bilateria and its subtaxa. To this end more genomic data are needed.

Metazoan mitochondrial DNA (mtDNA) is a closed-circular molecule typically comprising 37 genes which are, for the most part, invariant across the Metazoa10. These include the two rRNAs of the mitochondrial ribosome, 22 tRNAs necessary for translation, and 13 protein-coding genes for the enzymes of oxidative phosphorylation. atp8 is the only gene known to have been commonly lost from this complement, and this has been observed in a number of independent metazoan lineages, including the acoel Symsagittifera roscoffensis 11. In addition to primary sequence data, mtDNA has a number of other features which can be used for phylogenetic inference, including variations in mitochondrial genetic code12; a higher rate of sequence evolution than nuclear DNA13; changes in gene order; and changes in the secondary structure of rRNAs and tRNAs14.

Mitochondrial gene sequences have been used extensively for phylogenetic inference. In a recent paper, Rouse et al. used mitochondrial protein-coding sequence data from four newly discovered species of Xenoturbella (X. hollandorum, X. churro, X. monstrosa, and X. profunda) to infer the internal phylogeny of the Xenoturbellida15. Wider phylogenetic inference including mitochondrial proteins from these species placed Xenacoelomorpha with the deuterostomes15, corroborating previous mitochondrial phylogenetic analysis of this phylum9, 16, 17.

Mitochondrial gene content is largely invariable across the Metazoa, with the order in which genes are arranged being fairly stable and conserved for up to hundreds of millions of years in some metazoan lineages. Rearrangement events, thought to occur via a model of ‘duplication and deletion’14, 18, whereby a portion of the mitochondrial genome is duplicated, and the original copy of the duplicated gene subsequently deleted, are rare. The infrequency of such rearrangements, and the huge number of possible rearrangement scenarios, means that convergence on the same gene order in unrelated lineages is unlikely. Gene order is thus likely to retain evolutionary signals, with a common gene order being indicative of common ancestry and informative for the study of metazoan divergence19. Rearrangement of genes within the mitochondrial genome of different species can be a particularly powerful tool in the analysis of phylogenetic relationships14 and may also indicate accelerated evolution in a taxon.

In this study, we describe the mitochondrial genomes from three species of acoel: Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae. Adult specimens of all animals are approximately 1 mm in length, and, as is typical for small acoel species, they occupy the littoral and sub-littoral zones of marine ecosystems: P. rubra has been described across Europe and North America20, 21, and I. pulchra lives abundantly in the mud flats of Maine22. A. ylvae has been described from the West coast of Sweden23. All species move freely within the sediment by gliding on a multiciliated epidermis. First described by Rieger and Ott21, P. rubra is an elongate and flattened worm belonging to the family Paratomellidae24, 25. A 9.7 kb fragment of mitochondrial genome has previously been described from specimens of P. rubra collected on the Mediterranean coast of Spain26. I pulchra belongs to the family Isodiametridae; it can be maintained long-term in culture and has been used experimentally for in situ hybridisation, RNAi, and other molecular protocols22, 27, 28. It’s use as a ‘model acoel’ therefore makes this species particularly valuable for investigation. A. ylvae also belongs to the Isodiametridae family of acoels. Originally described by Kånneby et al. in 2014, its cox1 gene has been sequenced and used for classification, but no further genes from its mitochondrial genome have been sequenced23.

Results

Genomic composition

We assembled 14,954 base pairs of the P. rubra mitochondrial genome, starting from three genome assembly fragments and using Sanger sequencing of additional PCR fragments (Fig. 1a). We were unable to close the circular mitochondrial genome of P. rubra, but our 14.9 kb sequence contains all 13 protein-coding genes, both ribosomal genes and 22 putative tRNAs. Compared to the fragment of the genome previously published we have found four additional protein-coding genes and 12 additional tRNAs26. All genes are found exclusively on one strand of the sequence. Allowing for overlap, protein-coding genes account for 74.79% of the genomic sequence; ribosomal genes 13.95%; tRNA genes 9.10%, and non-coding DNA just 2.04%. A 156 nucleotide-long stretch of non-coding sequence is found between cytochrome c oxidase subunit 2 (cox2) and NADH dehydrogenase subunit 1 (nad1).

Figure 1
figure 1

Overview of the mitochondrial genome sequences we resolve for Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae (Xenacoelomorpha: Acoela). Genes not drawn to scale. Numbers beneath the sequences show intergenic spaces (positive values) or intergenic overlap (negative values). Protein-coding genes are denoted by three letter abbreviations; ribosomal genes by four letter abbreviations. tRNAs are shown by single uppercase letters. (A) P. rubra 14,957 base-pair long sequence. All genes found on the positive (forward) strand. Where genes, rRNAs or tRNAs are coloured orange, this is solely to demonstrate overlap with adjacent genes, rRNAs or tRNAs. (B) I. pulchra 18,725 base-pair long sequence. Genes found on the positive (forward) strand are coloured blue; genes on the negative (reverse) strand are coloured purple. Non-coding sequence shown in grey. (C) A. ylvae 16,619 nucleotide-long mitochondrial genome. Genes found on the positive (forward) strand are coloured blue; genes on the negative (reverse) strand are coloured purple. Non-coding regions greater than 100 nucleotides in length are shown in grey.

In P. rubra, trnS2 is predicted entirely within the sequence coding for nad1, and also has clear deviation from the traditional ‘cloverleaf’ secondary structure of tRNA. In addition, three of the predicted tRNAs have minor overlaps with protein-coding genes: trnA with nad3 (20 nucleotides); trnK with nad4l (18 nucleotides) and trnS1 with nad4 (six nucleotides); and all but five nucleotides of trnL1 are predicted within the same sequence as rrnL (Fig. 1a, Table 1). With the exception of trnT, all predicted tRNAs have an amino-acyl acceptor stem composed of seven base pairs, and all predicted tRNAs apart from trnT and trnS2 have a five base pair anticodon stem (Fig. 2). 11 tRNAs have one or two G-T mismatches in their acceptor or anticodon stems (A,C,G,I,K,L1,L2,P,Q,R,T). All tRNAs have a DHU arm of three or four nucleotides. The structure of the TψC arm shows greater variability, with a number of tRNAs having either a truncated stem, or the arm entirely lacking (Fig. 2).

Table 1 Organisation of the Paratomella rubra 14.9 kb mitochondrial genome sequence. All genes found on the ‘positive’ strand.
Figure 2
figure 2

Predicted secondary structures of tRNAs from the mitochondrial genome sequence of Paratomella rubra as predicted by MiTFi in Mitos.

For I. pulchra, we initially recovered a 13 kb contig, a 3.5 kb contig and a 19 kb contig of mitochondrial sequence from our transcriptomic data. The entire 13 kb contig and 2.4 kb of the 3.5 kb contigs were found to be perfectly matching subsets of the longer 19 kb sequence (Fig. 3). We designed several sets of PCR primers to verify the sequence between the 3′ end of the 13 kb and 5′ end of the 3.5 kb fragments found on the long 19 kb fragment (Fig. 3), however, no PCR amplification completely bridged the sequence between the 13 kb to 3.5 kb fragment. We found that the last (3′) 300 bp of the 13 kb fragment was duplicated in the opposite orientation within the end (3′) region of the 3.5 kb fragment. Although the long 19 kb fragment contained the repeated region between the 13 kb and 3.5 kb fragments, we were not able to connect sequences on both sides of the repeated region by PCR. The placement of Sanger sequencing fragments containing the repeat remained ambiguous. We are thus not confident in the assembly of the 19 kb transcriptomic sequence in this section, and therefore treat it as uncertain. Instead we focused on verifying the sequence of the two smaller fragments and on amplifying and sequencing the region lying between them. We reconfirmed the majority of the 13 kb fragment using PCR amplification and Sanger sequencing. We were also able to amplify and sequence fragments joining the 3′ end of the 3.5 kb fragment with the 5′ end of a 1.3 kb fragment containing the rrnL gene, which we had identified in transcriptome sequence data using blast. This contig included the duplicated region at the 3′ end of the 3.5 kb fragment (Fig. 3).

Figure 3
figure 3

Overview of the initial transcriptome assembly fragments and PCR strategy for scaffolding the Isodiametra pulchra mitochondrial genome. 1.3 kb, 13 kb and 3.5 kb fragments aligned to a continuous 19 kb fragment, with the location of the duplicated sequence in the 13 kb and 3.5 kb fragments shown by blue dashed lines. The ‘start’ and ‘end’ regions of the 13 kb and 3.5 kb fragments are annotated by 5′ (start) and 3′ (end). The approximate location of cob and nad1 protein-coding sequence are shown for reference. Reliable PCR-amplicons are shown in orange; the green PCR fragment indicates successful joining of the 3′ end of the 3.5 kb fragment to the rrnL fragment, including the duplicated section. The 18,725 base-pair long sequence we resolve is indicated by the pink lines, from ‘start’ to ‘end’.

In summary, we find the I. pulchra mitochondrial genome to have a span of at least 18,725 base pairs (Fig. 1b) based on our PCR validation of the transcriptomic data. This covers the region from the start of the 5′ end of the 3.5 kb sequence which is linked through PCR amplicons to the 5′ end of the 13 kb sequence (including the 1.3 kb rrnL contig), and up to the start of the duplicated sequence at the 3′ end of the 13 kb sequence (Fig. 3). As we were not able to bridge the region between the 3′ end of the 13 kb fragment and the 5′ end of the 3.5 kb fragment with PCR, we could not confirm the validity of the duplicated sequence at this position nor fully close the circular mitochondrial genome. It is therefore likely that the entire mitochondrial genome is larger than 19 kb, and may include the duplicated sequence. The sequence we are confident on presenting contains both ribosomal genes, all tRNAs and 11 protein-coding genes. These protein-coding genes and RNAs are encoded on both the plus and minus strands. No sequences resembling either atp8 or nad4l could be found in our sequence.

In the 18.7 kb sequence, protein-coding genes account for 56.66%; ribosomal genes contribute 8.15% and tRNA genes 7.77%. Compared to the P. rubra and S. roscoffensis mitochondrial genomes, intergenic space in the I. pulchra sequences is unusually high: non-coding DNA accounts for 22.72% of the sequences, including 14 intergenic regions of greater than 100 base pairs.

We identified all 22 expected tRNAs in the I. pulchra mitochondrial genome. Predicted sequences for rrnS and trnI overlap by four base pairs, but no other overlaps were found between any tRNAs or with any protein-coding genes (Fig. 1b, Table 2). All predicted tRNAs have an amino-acyl acceptor stem composed of seven base pairs and a five base pair anticodon stem, with the exception of trnE, trnF and trnS2, which have an anticodon stem composed of only four base pairs (Fig. 4). The structure of the DHU arms and TψC show greater variability, and are composed of either 3 or 4, or between 3 and 6, base pairs respectively, across the 22 tRNAs. Whilst the TψC arm is missing entirely in trnQ, and very truncated in trnE, trnF, trnG and trnP, more of the predicted tRNAs fit the stereotypical ‘cloverleaf’ secondary structure than has been found for other acoel species, including S. roscoffensis and P. rubra (Fig. 4).

Table 2 Organisation of the Isodiametra pulchra 18.7 kb mitochondrial genome sequence.
Figure 4
figure 4

Predicted secondary structure of tRNAs from the mitochondrial genome sequences of and Isodiametra pulchra as predicted by MiTFi in Mitos.

The complete, closed circular mitochondrial genome of A. ylvae was recovered from genome sequencing data of P. rubra specimens collected from Yorkshire, UK. Contamination of the P. rubra samples was confirmed using NCBI Blast, which yielded a 99% identical sequence to the published A. ylvae cox1. The complete A. ylvae mitochondrial genome is 16,619 nucleotides in length, and contains 12 protein-coding genes, both rRNAs, and 22 predicted tRNAs (Fig. 1c, Table 3). With cox1 at the start of the genome on the ‘positive’ strand, all other protein-coding genes apart from cox3 and nad2 are found on the ‘negative’ strand. Both rRNAs are found on the positive strand, and tRNAs are distributed between the two. Accounting for a small amount of overlap between genes - totalling 18 nucleotides of overlap across the whole genome - protein-coding genes make up 64.72% of the genome. tRNAs contribute 8.91%, and rRNAs 9.31%. As found for I. pulchra, non-coding DNA makes up a large amount of the genome, totalling 17.17%.

Table 3 Organisation of the Archaphanostoma ylvae 16.6 kb mitochondrial genome.

We identified possible sequences for all 22 mitochondrial tRNAs in the A. ylvae genome, although four of these (trnE, trnI, trnK and trnS1) have an e-value prediction of greater than 0.0001. All predicted secondary structures of the tRNAs in the A. ylvae mitochondrial genome have standard-length acceptor and anticodon stems, and the majority – with the exception of trnK, L1, L2, N, S2 and Y – have a four nucleotide-long D-loop (Fig. 5). As in the other acoel mitochondrial genomes, the greatest variability is found in the TψC arm, which is truncated in trnD, E, F, L1, P, V, W and Y and missing in trnQ (Fig. 5).

Figure 5
figure 5

Predicted secondary structure of tRNAs from the mitochondrial genome sequence of Archaphanostoma ylvae as predicted by MiTFi in Mitos.

The P. rubra genome is 78.15% A + T rich, which is higher than the A + T content calculated for the I. pulchra genomic sequence at 67.28%, and the A. ylvae complete genome at 74.70%. Overall nucleotide usage on the plus strand of P. rubra is: A = 29.29%; T = 48.86%; C = 6.77% and G = 15.10%; GC-skew = 0.38 and absolute AT-skew = 0.25. Overall nucleotide usage for I. pulchra is: A = 34.04%; T = 33.24%; C = 16.45% and G = 16.27%; GC-skew = 0.006 and AT-skew = 0.012. For A. ylvae, A = 40.41%; T = 34.29%; C = 12.82% and G = 12.47%; GC-skew = 0.014 and AT-skew = 0.082. GC-skew and AT-skew absolute values for P. rubra are much higher than that of S. roscoffensis, although the absolute values for I. pulchra and A. ylvae are comparatively low11. AT-skew value for the P. rubra sequence is just 0.01 different from that of the published partial P. rubra genome, and GC-skew is slightly higher (published P. rubra GC-skew = 0.32)26.

Gene order and gene arrangement

All thirteen protein-coding genes in P. rubra have complete initiation codons: ATA (x5) and ATT (x8). Five of the protein-coding genes previously published differ in the nucleotide sequence of their start codons: nad2, atp8, cox2, and cox3 all have ATA as an initiation codon in our analysis, compared to ATT found in previous analysis26. Twelve of the genes have full stop codons: TAA (x9) or TAG (x3). atp8 was found to have a truncated stop codon (TA-), which is thought to be completed during post-transcriptional modification (Table 1). The eleven protein-coding genes found for I. pulchra also have full initiation codons: ATA (x4) and ATG (x7). Eight of the genes for this species have full stop codons: TAA (x7) and TAG (x1); nad6, atp6 and cob are inferred to have truncated stop codons (Table 2). Initiation codes in A. ylvae are: ATA x4, ATG x7 and ATT x1; all genes have TAA as stop codons, with the exception of nad6, which has TAG (Table 3). As in other invertebrate mitochondrial genomes, our data indicates a deviation from the ‘standard’ genetic code, with ATA encoding the start codon methionine, M, instead of isoleucine, I.

We found all genes in P. rubra on the ‘plus’ strand. In I. pulchra, genes are distributed over the plus and minus strands, with just two ‘blocks’ of genes with the same transcriptional polarity clustered together (rrnL-rrnS-trnI-trnP-trnY; trnS2-nad5-trnW-nad2). Similarly, in A. ylvae genes are distributed across the two strands, with two clustered ‘blocks’ of genes and tRNAs (nad4l-nad6-nad4-trnT-cox2-nad3-trnV-trnY; trnM-trnH-trnQ-trnG-trnS1-trnI-trnF-trnA-trnL1-trnE-trnC). Whilst the P. rubra mitochondrial sequence has a large degree of overlap between adjacent genes, the opposite is true for I. pulchra and A. ylvae. Unlike other metazoan mitochondrial genomes, where genes are adjacent or overlapping and one or two larger non-coding regions are commonly found, I. pulchra non-coding sequence is found consistently between protein-coding genes and between tRNAs, ranging in length from eleven to 277 base pairs. In addition, three long non-coding regions of 788, 1143 and 717 base pairs are found at the start of our sequence; between trnL1 and trnN; and trnN and rrnL. The A + T content of these three sections are 68.78%, 65.79% and 76.15% respectively. The compositional difference between the 717 base pair non-coding region and the rest of the genome is statistically different (χ2 = 25.629, p < 0.0001), with a higher A + T content indicating that it could function as a transcriptional control region. The A + T content of the 788 base pair non-coding region is not significantly higher than the rest of the sequence (χ2 = 0.85). Similarly, there is a large portion of intergenic, non-coding sequence in the A. ylvae genome. Eight regions of non-coding sequence greater than 100 base pairs are found throughout the genome, with 24 additional smaller intergenic regions, ranging in size from 3 to 70 base pairs. Of the larger non-coding sequences, three have an A + T content that is statistically higher than the entire sequence: 309 nucleotides between cox1 and nad4l2 = 3.944, p < 0.1); 126 nucleotides between nad4l and nad62 = 6.964, p < 0.01) and 260 nucleotides between trnS2 and rrnL2 = 9.654, p < 0.01). The P. rubra sequence has just one longer non-coding intergenic sequence, of 196 base pairs.

As was already indicated by the 9.7 kb published partial genome26, the gene arrangement we found in P. rubra is unique amongst published metazoan mitochondrial genomes. Similarly, neither I. pulchra nor A. ylvae show any similarity to other published metazoan mitochondrial genomes (Fig. 6). The species analysed in this study share only the small ‘block’ of nad3-atp6-nad4-cob (I. pulchra) and cob-nad4-nad3 (P. rubra). However, the order is reversed between the two, and the genes are distributed across both strands in I. pulchra, so it is unlikely that this represents a feature inherited from a common ancestor. To quantify the number of common gene arrangements between the species in this study and other mitochondrial genomes, protein-coding gene and ribosomal RNA gene order was analysed using CREx29 (compared to the acoel S. roscoffensis, the xenoturbellid Xenoturbella bocki, and the metazoan mitochondrial ‘ground plan’, represented by L. polyphemus). Conserved mitochondrial gene ‘blocks’ (that is, a series of genes, regardless of their order within the grouping) were very infrequent between the species. Of the genomes compared, the highest number of common gene blocks was found between X. bocki and P. rubra. This result was not significant, finding only 16 common intervals out of a possible 150, and confirming the visual observation that gene order between these species is highly variable.

Figure 6
figure 6

Comparisons of gene orders in the mitochondrial genome sequences resolved for Paratomella rubra, Isodiametra pulchra and Archaphanostoma ylvae compared to a published P. rubra fragment; the acoel Symsagittifera roscoffensis; the xenoturbellid Xenoturbella bocki; the nemertodermatid Nemertoderma westbladi and the metazoan mitochondrial ‘ground plan’ gene order, represented by Limulus polyphemus. Genes are not drawn to scale. Coloured genes chosen to show ‘anchors’ and divergence from the ground plan gene order in other species.

Phylogenetic analysis, and population differentiation

We used our new mitochondrial data from P. rubra, I. pulchra, and A. ylvae to investigate the internal phylogeny of the acoels and to test support for an Acoela-Xenoturbellida affinity. We observed that including the fast-evolving tunicates into our phylogeny leads to a clustering of these species and the acoels in an artificial long-branched clade (Supplementary Figure S1). Removing the tunicates, Bayesian phylogenetic inference was carried out using the protein-coding genes of P. rubra, I. pulchra and A. ylvae on a trimmed concatenated amino acid alignment, including the additional species listed in Supplementary Table S2. The data set reached a MaxDiff of 0.17 after 39,525 trees were sampled across 10 chains discarding the first 400 trees (per chain) as burnin and sampling every 10th tree. In both, this and the maximum likelihood approaches, the protostome/deuterostome split was correctly inferred and Xenacoelomorpha were found splitting off inside Deuterostomia. P. rubra, I. pulchra, and A. ylvae were grouped inside Acoela, as expected (Fig. 7).

Figure 7
figure 7

Bayesian (using PhyloBayes53) and Maximum Likelihood (using RAxML51) phylogenetic analysis of mitochondrial protein-coding genes from the Metazoa, including P. rubra, I. pulchra and A. ylvae with posterior probability and bootstrap support values, respectively, at relevant nodes. Analysis carried out on trimmed alignment. Topology of both trees is identical.

We detected the Nemertodermatida species Nemertoderma westbladi inside Mollusca and we therefore conclude that the limited data available on Genbank for this species is a contamination.

Having access to the published partial P. rubra mitochondrial sequence from a population sampled near Barcelona (Spain) and our own samples from Yorkshire, UK, we could estimate total sequence divergence and compare non-synonymous to synonymous substitutions in eight protein-coding genes found on the Spanish fragment to the same genes from the Yorkshire mitochondrial genome. We found the 9.7 kb sequences to be 82.62% similar at the nucleotide level. The number of substitutions varied between, for example, 23 in the shortest gene alignment (atp8; 177 bp), to 161 in nad2 (972 bp), and 116 in the 1401 bp long cox1 alignment (Supplementary Table S3). Notably, non-synonymous substitutions appear to be frequent with, for example, 13 in atp8, 104 in nad2, and 25 in cox1 (see Supplementary Table S3). Similarity of the cox1 sequences on the nucleotide level is only 91% over 666 bp, thus higher when compared to species pairs (Supplementary Table S4), but lower than the 95–98% percent threshold used to distinguish species in cox1 based barcoding30.

Discussion

The 14.9 kb sequence of the P. rubra mitochondrial genome determined in our analysis contains the full complement of 37 genes typical of metazoan mitochondrial DNA. Numerous lab-based and computational efforts to close the circular mitochondrial genome were unsuccessful, but the complete gene complement and length of our final sequence indicates that this fragment covers the majority of the P. rubra complete mitochondrial genome. The difficulty we encountered in attempting to close the circular mitochondrial sequence may be attributed to the AT-rich, repetitive sequence found at both ends of the fragment, which could have prevented PCR amplification. Similar regions have been shown as problematic in studies of other mitochondrial genomes31. As no long stretch of non-coding sequence was found for this species in our study, the missing sequence might represent its mitochondrial transcription control region. Nonetheless, the overall AT content of the P. rubra mitochondrial sequence (78.15%) is high even for mtDNA, and greater than the A + T content of the mitochondrial genome of the acoel S. roscoffensis (75.3%)11 and the published partial P. rubra genome (76.4%)26.

The validity of the duplicated sequence found in our analysis of the I. pulchra mitochondrial genome could not be confirmed by PCR or computational efforts to map short reads to resolve it. Duplications within mitochondrial genomes are not uncommon, and changes to mitochondrial gene order are widely thought to arise as a result of a sequence ‘duplication and deletion’ mechanism14, 32, 33. A number of mitochondrial genomes with duplicated sequences have been reported in species with a divergent mitochondrial gene order33,34,35. Given the highly unusual gene order of the I. pulchra mitochondrial genome, a genomic duplication such as this could provide evidence for a genomic ‘duplication and deletion’ rearrangement of genes. The rearrangement and separation of protein-coding genes in other mitochondrial genomes has been attributed to long, non-tandem, inverted repeats34, and this could be true for I. pulchra. Furthermore, very long nematode mitochondrial genomes with variable duplicated regions have been found with a conserved region containing the majority of the protein-coding genes36: in I. pulchra the protein-coding genes and tRNAs, with the exception of nad1 and trnD and L1 are found in one long region, outside of the duplicated section. However, long non-coding duplications are frequently adjacent to tRNAs or other sequences capable of forming stem-and-loop structures37. This is not true for the potential duplicate in I. pulchra. Most puzzling, both occurrences of the duplicate are identical, nucleotide-by-nucleotide, and unless the duplication occurred exceptionally recently, it is likely that spontaneous mutations would result in differences between the two copies of the sequence, especially given the elevated mutation rate of mitochondrial genomes. While it is true that the duplicated sequences appear at the start and end point of transcriptome assembly contigs, meaning it is possible that the duplication observed occurred only as a result of a sequencing and assembly error, their existence is nevertheless supported by PCR products which show an identical sequence being adjacent to both rrnL and to cob.

The 14.9 kb mitochondrial genome of P. rubra, the 18.7 kb sequence from I. pulchra and the complete 16.6 kb A. ylvae mitochondrial genome show no significant organisational similarity to any other published metazoan mitochondrial genome (Fig. 6). Comparison of the 14.9 kb P. rubra sequence with the published 9.7 kb P. rubra fragment shows an identical protein-coding and ribosomal gene order, but with variation in tRNA order (Fig. 6). tRNAs are reported to show much more frequent gene translocation compared to larger genes38, which could account for these discrepancies. This, and the differences on the nucleotide level, including the relatively low level of similarity in the cox1 barcoding gene might indicate that P. rubra collected from Barcelona (Spain)26 and our animals from Yorkshire (England) should be regarded as cryptic species and not just divergent populations. Given the large and mostly unresolved diversity in benthic communities39, and the marine environment in general40, a differentiation into (cryptic) species cannot be seen as surprising.

All species analysed in this study are unique in the orientations and orders of their genes: P. rubra has genes exclusively on one strand; I. pulchra has an almost-equal distribution of genes across both the plus (18 genes) and minus (17 genes) strand; with cox1 in a ‘forward’ orientation at the start of the genome, the majority of the protein-coding genes for A. ylvae are found on the minus strand. Furthermore, genes in I. pulchra and A. ylvae are not clustered into groups of ‘gene blocks’ on the same strand, but are found frequently as one or two genes on each strand. The finding of a unique gene order for these species seems to be typical for the acoels: analysis of the complete S. roscoffensis mitochondrial genome found no gene order similarity to any other species published to date, suggesting great variability in mitochondrial gene order amongst the acoels11.

In addition to an unusual gene order, the mitochondrial genome of P. rubra shows frequent overlaps between protein-coding genes and tRNAs. tRNAs have been reported within protein-coding genes in other metazoan mitochondrial genomes41, 42, and given that no other location could be predicted for these sequences, this overlap could represent the simultaneous coding for both tRNAs and protein-coding genes. Overlap in coding sequence could be the result of selection to reduce genome size, accompanied by a reduction in non-coding sequence42, and truncated tRNAs with incomplete secondary structure43, both of which are also found for the P. rubra sequence. The opposite is true for the I. pulchra and A. ylvae sequences. For I. pulchra, the sequence we could confidently verify makes the minimal length of the I. pulchra mitochondrial genome 18,725 base pairs, and it is likely to be longer in the complete closed circular genome. As has been found for other ‘long’ mitochondrial genomes, this is largely due to a large portion of the genome being non-coding44. The lengths of protein-coding genes inferred for I. pulchra are similar to those of other acoel species (Table 4), and in addition, two protein-coding genes (atp8 and nad4l) appear to have been lost from the genome, contributing to a reduced proportion of protein-coding gene sequence within the genome. The loss of atp8 is not unusual, and has been reported in a number of unrelated taxa, as well as S. roscoffensis and in our A. ylvae mitochondrial genome11. The absence of nad4l is more unusual, and could be a result of its existence in a portion of the genome that we have been unable to sequence. Although non-coding sequence contributes a relatively large proportion of the A. ylvae mitochondrial genome (17.17% compared to 22.72% in the I. pulchra sequence), the total genome is not exceptionally long.

Table 4 Length of protein–coding genes in acoel mitochondrial genomes. All gene lengths in base pairs.

The internal phylogeny we resolve for Acoela is in line with that proposed by Jondelius et al.25. I. pulchra and A. ylvae group together in the family Isodiametridae. Isodiametridae groups with S. roscoffensis, Neochildia fusca and Convolutriloba longifissura, which are all members of the Convolutidae. P. rubra forms a separate branch outside the Convolutidae, representing the Paratomellidae. We interpreted the initial grouping of the acoels and tunicates as a classical example of long branch attraction (LBA) (Supplementary Figure S1). The accelerated substitution rates in mitochondrial DNA are also evidenced by the cryptic divergence we find in P. rubra, and may well lead to LBA in phylogenies derived from mitochondrial protein-coding genes, owing to the clustering of rapidly evolving lineages. This is of particular relevance for acoel species, which already demonstrate a very rapid rate of nucleotide substitution compared to other metazoans, leaving them vulnerable to LBA. Excluding the urochordates, we do retrieve Xenacoelomorpha as a branch of the deuterostomes, as expected (Fig. 7).

The mitochondrial genome sequences we analysed for the acoel species P. rubra, I. pulchra and A. ylvae have very divergent gene orders compared to other metazoan species. Furthermore, these species have very different mitochondrial features: a large amount of genomic overlap in P. rubra, and a lot of non-coding sequence in I. pulchra and A. ylvae. It is also possible that the mitochondrial genome of I. pulchra has a non-tandem inverted duplication - which could provide a mechanism for gene order variation - but this could not be confirmed by lab or computational based methods. Although limited to four species, the uniqueness of acoel mitochondrial genomes analysed so far11, 26 means that gene order and other mitochondrial genome features may not be phylogenetically informative for this order, although further mitochondrial genomes from other members of the Acoela would no doubt aid in this comparative analysis. Similarly, the cryptic divergence found between P. rubra samples from Yorkshire and Barcelona illustrate the usefulness of studying mitochondrial genomes to understand hidden species diversity. Our data clearly emphasise the still problematic placing of Xenacoelomorpha, with the clade firmly placed inside deuterostomes, but LBA drawing the acoels towards the outgroup when the tunicates were included. In summary, more data from genomes of early branching taxa are needed to resolve phylogenetic and biological questions.

Methods

Specimen collection, DNA extraction and PCR

Live Paratomella rubra specimens were isolated from sand samples collected from Filey, North Yorkshire and were immediately frozen and stored at −70 °C following identification. Specimens of Isodiametra pulchra were cultured in petri dishes with nutrient-enriched f/2 sea water and fed ad libitum on Nitzschia curvilineata diatoms. DNA was extracted from live specimens of I. pulchra and frozen specimens of P. rubra using the QIAamp DNA Micro Kit (Qiagen: Product No. 56304) with the manufacturer recommended protocol.

All PCRs were done using the GeneAmp PCR System 2700 (Applied Bioscience). PCRs were carried out using the Expand Long-Range PCR Kit (Roche Applied Sciences: Product No. 11681834001), following manufacturer recommendations for 50 μl reaction set-up. General cycling protocol was: 92 °C for 2 min; 15 cycles of: 92 °C for 10 sec, 57 °C for 15 sec, 68 °C at initial elongation time (approximated as 1 min per 1000 base pairs to be amplified); 2 cycles each of: 92 °C for 10 sec, 57 °C for 15 sec, 68 °C at 40 sec longer than initial elongation time, repeated at increasing 40 sec intervals for a further 14 cycles; a final elongation stage at 68 °C for 7 min and a 4 °C ‘hold’ stage. Where PCRs were not successful using this protocol, they were repeated using the Q5 High-Fidelity PCR Kit (New England Biolabs: Product No. E0555L), following manufacturer recommendations for a 25 μl reaction. Cycling protocol was: 92 °C for 2 min; 40 cycles of: 92 °C for 10 sec, 58 °C for 15 sec, 68 °C at initial elongation time (approximated as 1 min per 1000 base pairs to be amplified); a final elongation stage at 68 °C for 7 min. Amplified products were visualised on ethidium-bromide stained TAE 0.8% gels. Bands of expected size were purified using the High Pure PCR Product Purification Kit (Roche Applied Sciences: Product No. 11732668001) and sent for sequencing by Source BioScience Life Sciences. Only amplifications that resulted in one clear band on the TAE 0.8% agarose gel were sequenced.

Three fragments of sequence from the mitochondrial genome of P. rubra, of size ~5.8 kb, ~4 kb, and ~1.2 kb, were generated from our gDNA assembly. Fragments were verified using a translated nucleotide query blast with invertebrate codon usage (blastx NCBI), and their orientation determined by gene annotation in comparison to the published 9.7 kb section of the P. rubra mitochondrial genome26. Primers were designed in conserved gene regions to:

(1) Amplify across the ‘N-stretches’ present in the 5.8 kb and 4 kb fragments (8 and 9 N-stretches respectively, all of arbitrary length 50 base pairs).

(2) Cover the whole 1.2 kb fragment, with the aim of resolving the two frameshift mutations within the assembled sequence.

(3) Close the circular mitochondrial genome, by joining the 5.8 kb fragment to the 1.2 kb fragment; the 1.2 kb fragment to the 4 kb fragment; and the 4 kb fragment to the 5.8 kb fragment (see Supplementary Figure S5). Amplification of the fragments joining the 1.2 kb fragment to the 4 kb fragment and to close the mitochondrial genome using standard PCR cycling were unsuccessful. These were repeated using a touchdown protocol with Expand Long-Range polymerase. Annealing temperature was set at 65 °C with decreasing 2 °C intervals every 2 cycles down to 49 °C. Initial elongation time was calculated as before, increasing 30 sec every two cycles of the touchdown, with a final 6 cycles at 49 °C. This successfully amplified the region joining the 1.2 kb fragment to the 4 kb fragment, but we could not close the circular genome. Design of three new forward and reverse primers, tried in all combinations and using variable PCR parameters were unsuccessful in closing the mitochondrial genome. Additional RNA-Seq and DNA genomic sequencing data corroborated the stretches of sequence at either end of the mitochondrial genome but did not aid in closing the circle.

Three mitochondrial contigs of size ~13 kb, ~3.5 kb and ~1.3 kb were identified from I. pulchra Trinity transcriptome assembly from total RNA sequencing. A further contig of ~19 kb was also identified, covering the entire ~1.3 and 13 kb regions, and ~2.4 kb of the 3.5 kb sequence. Fragments were verified using blastx, NCBI, as outlined for P. rubra, and approximations for the location of protein-coding genes and tRNAs determined using the MITOS mitochondrial genome annotation server45 (http://mitos.bioinf.uni-leipzig.de/help.py). Primers were designed to span the 13 kb contig in two ~5 kb sections, and to join the 13 kb contig to the 3.5 kb contig in both directions, to close the mitochondrial genome and check the validity of the duplicated region (Fig. 3). RNA-Seq data for I. pulchra were mapped to the long transcriptome assembly contigs and PCR sequencing results using NextGenMap46, and visualised using Tablet47.

We accidentally co-sequenced A. ylvae at very low coverage as part of a P. rubra genome sequencing experiment. From an initial paired end assembly of Illumina HiSeq data with the CLC assembly cell software (v.5.0) we extracted the full mitochondrial circle of A. ylvae in a single contig identified with BLAST. This was then annotated using MITOS and manual refinement as described above.

Genome annotation

For P. rubra, all sequenced fragments were aligned against the initial scaffold of the 9.7 kb published sequence26; the 5.8 kb, 4 kb and 1.2 kb genome assembly sequences; and an additional long genome assembly fragment of length 14,954 (see Supplementary Figure S5). All contigs and PCR sequencing results were similarly aligned for I. pulchra, but without a reference sequence (see Fig. 3). Alignments were visualised using Mesquite (http://mesquiteproject.org) with invertebrate mitochondrial translated amino acid state colour coding. Where ambiguity remained between PCR sequencing results and genome or transcriptome assembly fragments, the genome or transcriptome assembly nucleotide sequence was used to establish a final ‘consensus’ sequence for each mitochondrial genome. This was of particular relevance for repetitive AT regions – for example, within P.rubra nad1, for which PCR sequencing results were inconclusive. In the case of I. pulchra, where the validity of the duplicated sections could not be confidently determined, we resolved a consensus sequence of 18, 725 base pairs (Fig. 3).

The region for each protein-coding mitochondrial gene (nad16, nad4l, cox13, cob, atp6 and atp8) in the P. rubra, I. pulchra and A. ylvae sequences were compared against published mitochondrial genomes using a translated nucleotide query (blastx, NCBI) with NCBI translation table number 5 ‘invertebrate mitochondrial’. Published genes from the mitochondrial genomes of the acoels Symsagittifera roscoffensis and P.rubra were downloaded from the NCBI Nucleotide database and aligned to the new consensus gene sequences of both P. rubra, I. pulchra and A. ylvae to verify the location of protein-coding and ribosomal RNA-encoding genes. The 5′ end of protein-coding genes were inferred to start from the first in-frame start codon (ATN, GTG, TTG, or GTT), even if this appeared to overlap with the preceding gene. Similarly, the terminal of protein-coding genes was inferred to be the first in-frame stop codon (TAA, TAG, or TGA). If no stop codon was present, a truncated stop-codon (T– or TA-) prior to the beginning of the next gene was assumed to be the termination codon, completed by post-transcription polyadenlylation. tRNA sequences and putative secondary structures were identified using the Mitfi program within MITOS.

Sequence alignment, phylogenetic analysis, and evolutionary rates

Phylogenetic analysis was performed using a concatenated amino acid alignment of all thirteen protein-coding genes for P. rubra, and all eleven protein-coding genes present in I. pulchra. Nucleotide sequences from all three acoel taxa and an additional set of species comprising 54 metazoans, taken from a range of published metazoan mitochondrial genomes representing deuterostomes, protostomes, cnidarians, and two species of poriferans as an outgroup (Supplementary Table S2) were aligned using TranslatorX (http://www.translatorx.co.uk/) independently for all genes with the appropriate mitochondrial genetic code set for each taxon included, using ClustalOmega48 for amino acid alignment (Supplementary Table S2). Protein alignments were reduced to the most informative residues using trimAl v.1.4.rev1549 with standard settings. Regions showing ambiguity in alignment were excluded, so that only blocks of well-aligned sequence were included for analysis.

We initially re-constructed neighbour nets in SplitsTrees v.450 to screen our dataset for potentially non-treelike patterns, which could impede our phylogenetic analysis. Subsequently, we used RAxML51 v. 8.2.9 to infer maximum likelihood phylogenies from the original and the reduced alignments under the MTZOA model52. Bootstrapping was conducted employing the ‘autoMRE’ option in RAxML and the trees visualised with figtree v.1.4 (http://tree.bio.ed.ac.uk/software/figtree). We carried out Bayesian inference on the same trimmed alignment with PhyloBayes v.4.153 under the MTZOA model. We ran 10 chains in parallel and stopped the tree search at ~3950 trees per chain, with a maximum difference of 0.17, when discarding 400 trees as burnin and sampling every 10th tree per chain.

We used the Geneious software (v.8) to calculate sequence differences between a 9.7 kb section of the P. rubra mitochondrial genome originating from worms sampled in Filey, Yorkshire (UK) and Barcelona (Spain)26. For eight protein-coding genes found on this section we used ParaAT (v2.0)54 to calculate translation alignments and the KaKs calculator (v1.2)55 to access substitution rates. Also using Geneious we calculated a difference matrix for the cox1 barcoding gene of the two P. rubra populations in comparison to acoel coI sequences retrieved from Genbank.