Hypodontus macropi sensu lato (Strongyloidea) is a nematode that occurs in the terminal ileum, caecum or colon of a range of macropodid marsupials (kangaroos and wallabies) and was first described as a hookworm-like nematode [1, 2]. Various studies of H. macropi from hosts across vast geographical distances in Australia [37] did not reveal any significant morphological variation in the nematode among different host species, but particular predilection sites in the gut as well as host affiliations suggested that H. macropi represented a complex of morphologically indistinguishable, but genetically distinct species or groups which have a relatively high level of host specificity. To test this proposal, Chilton et al. [8] conducted a multilocus enzyme electrophoretic (MEE) analysis and subsequently [9] determined genetic variation in the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA (rDNA), and provided evidence of genetic differences in H. macropi.

Using single-strand conformation polymorphism (SSCP) and selective sequence analysis of the ITS-2, genetic variation in the nematode was found [10], with nucleotide sequence variability mainly in loops or bulges of the predicted secondary structure. Moreover, in the most extensive investigation of H. macropi to date [11], a molecular-phylogenetic analysis of 547 specimens representing all ten known macropidid host species of this nematode from across the Australian continent revealed clear genetic clades in each of Macropus agilis, M. dorsalis, M. rufogriseus, M. bicolor, Petrogale persephone, Thylogale billardierii and T. stigmatica. Another clade was represented by all specimens from M. robustus robustus and M. rufus, together with two examples of host switching by the nematode into M. fuliginosus.

These findings showed that H. macropi, as currently defined morphologically, might represent as many as ten cryptic species. However, to date, DNA sequencing studies have been conducted exclusively using the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA, which represents species prospecting (as defined in [12]), rather than clear evidence for the existence of separate species. Although ITS-2 has consistently provided specific identification of strongyloid nematodes [1315], there is a need to provide independent evidence of genetic identity and differentiation. Various studies have shown that mt genomes provide barcodes for organisms [1621]. Although nucleotide variation within species of nematodes is relatively high for the mt genes studied to date [18] and, thus, is not useful for specific identification, this is not the case for mt protein sequences. Indeed, amino acid sequence variation within species of nematodes examined to date is usually very low (0–1.3%) [1719]. Therefore, amino acid sequences derived from the mt genome provide barcodes for species and for studying the systematics (taxonomy and phylogeny) of nematodes [20]. Importantly, phylogenetic analysis of mt amino acid datasets usually provides strong statistical support for the relationships of nematodes, which is not achieved using data from short sequence tracts.

In the present study, we sequenced the complete mt genomes of individuals of H. macropi, proposed to be distinct cryptic species based on MEE and ITS-2 data sets [811] from three different species of macropodid hosts. We undertook a comparative analysis of the amino acid sequence datasets derived from these genomes and conducted a phylogenetic analysis of these datasets using homologous datasets for other strongyloid nematodes for comparison.


Characteristics and comparisons of mt genomes

The three mt genome sequences of three operational taxonomic units (OTUs) OTU-C, OTU-G and OTU-J of H. macropi from M. r. robustus, M. bicolor and T. billardierii and of Macropicola (Ma.) ocydromi (outgroup), were assembled from 73,189, 29,997, 20,3341 and 127,146 individual sequence reads, respectively. For individual OTUs, the number of contigs resulting from each assembly ranged from 69 to 508, with the longest and shortest contigs being 11,170 and 41 bp in length.

The three circular mt genome sequences of OTU-C, OTU-G and OTU-J of H. macropi were 13699 bp, 13655 bp and 13634 bp in length, respectively (Figure 1). The mt genome of a closely related nematode, (Ma. ocydromi, was 13517 bp in length (Figure 1). Consistent with all species of related nematodes sequenced to date [18, 20, 21], the mt genomes of all three H. macropi OTUs and Ma. ocydromi contained 12 protein coding genes [adenosine triphosphatase subunit 6 (atp6), cytochrome b subunit (cytb), cytochrome c subunits 1–3 (cox1-3), and the nicotinamide dehydrogenase subunits 1–6 and 4 l (nad1-6 and 4 l) genes], two ribosomal subunits [large (rrnL) and small (rrnS) subunit] and 22 transfer RNA (tRNA) genes (including two leucine and two serine tRNA genes). The overall mt genome structure for all OTUs of H. macropi and Ma. ocydromi was consistent with the gene arrangement GA2 (arrangement of mt genes) representing all other nematodes of the order Strongylida investigated to date [18, 20]. The positions, lengths and start/stop codons of individual genes as well as amino acid sequence lengths of predicted proteins of the three OTUs are shown in Table 1. As expected for other strongyloid nematodes [18], no atp8 subunit is present, and all genes are transcribed from the forward strand; all protein coding genes had open reading frames (ORFs). For all three OTUs, the lengths of the protein-coding genes were in the following order: nad5 > cox1 > nad4 > cytb > nad1 > nad2 > cox3 > cox2 > atp6 > nad6 > nad3 > nad4L (Table 1). The longest gene was nad5, and the amino acid lengths of cox1, nad4L and cox3 were the same for all OTUs (Table 1). The predicted nucleotide and amino acid sequences of each of the 12 protein-coding genes of H. macropi and Ma. ocydromi were compared. The mt genes of these OTUs had ATT, ATA and ATG as initiation codons; except for ATC as its initiation codon for atp6. All genes had complete termination codons, i.e., TAA and TAG, in contrast to some nematodes, such as Ascaris suum, Necator americanus and Trichinella spiralis, which have abbreviated stop codons (e.g., TA or T) for some genes [2224].

Figure 1
figure 1

Schematic representation of the circular mitochondrial genome of Hypodontus macropi and Macropicola ocydromi. Three operational taxonomic units (OTUs) of H. macropi were from three different hosts (i.e., OTU-C from Macropus robustus robustus; OTU-G from Macropus bicolor; OTU-J from Thylogale billardierii) and M. ocydromi was from Macropus fuliginosus. Transfer RNA genes are designated using one-letter amino acid codes. Ribosomal genes are shaded.

Table 1 Summary of mitochondrial genomes of Hypodontus macropi from three different hosts and Macropicola ocydromi

Twenty-two tRNA genes were predicted from each mt genome of the three OTUs of H. macropi, and these genes varied from 53–64 nucleotides in length. Their positions in the genome were the same for all OTUs and for Ma. ocydromi (Table 1). The two ribosomal RNA genes (rrnS and rrnL) were separated by nad3, nad5, nad6, nad4L and 5 trn genes (Figure 1). The lengths of these rrn genes were 700 bp and 959 bp, respectively, which is similar to those reported previously for various nematodes, including Ancylostoma duodenale, Necator americanus, Onchocerca volvulus, Strongyloides stercoralis and Xiphinema americanum[23, 2528].

The AT-content of the mt genomes of H. macropi was 70-78%. Within the Strongylida, AT-richness in the mt genome usually ranges from 76.6-77.2% [25, 29]. The nucleotide composition of the entire mt genome is biased toward T, and C is least favoured (Table 2), which is similar to other nematodes including Caenorhabditis elegans, Dirofilaria immitis and Onchocerca volvulus[3032]. The AT content of the rrn genes for H. macropi and Ma. ocydromi ranged from 75–77% (Table 2). Given the AT bias (70-78%) in the mt genome of H. macropi, there was considerable bias in codon usage. Consequently, ATR (methionine), ATY (isoleucine), TTR (leucine), and TTY (phenylalanine) were the most commonly used codons (6.6-7.2%, 6.4-7.6%, 7.2-18% and 12.7-13%, respectively) (Table 3).

Table 2 Nucleotide composition (%) for the entire or regions of the mitochondrial genomes determined herein
Table 3 Codon usages (%) in mitochondrial protein genes of Hypodontus macropi from three different hosts

Genetic differences and phylogenetic relationships

Pairwise comparisons of the amino acid (aa) sequences predicted for each of the 12 protein coding mt genes from OTU-C, OTU-G and OTU-J of H. macropi from the hosts M. robustus, M. bicolor and T. billardierii, respectively, are shown in Table 4. Most variation was detected in NAD6 between OTU-C and OTU-J (29%) and between OTU-G and OTU-J (26.9%), and in NAD4L between OTU-C and OTU-G (15.4%); moderate variation was detected in NAD1, NAD2, NAD3, NAD4 and NAD5 (mean: 9.8%) among the OTUs; least variation was detected in COX1, COX2 and COX3 (1.7-4.3%). Pairwise comparisons of concatenated amino acid sequences revealed sequence differences of 5.8% (OTU-C versus OTU-G), 18% (OTU-C versus OTU-J) and 10.0% (OTU-G versus OTU-J) (Table 4). These percentages are substantially higher than those detected within species of strongyloid nematodes [17, 18, 23]. Pairwise comparisons of the concatenated amino acid sequence of Ma. ocydromi with those of OTU-C, OTU-G and OTU-J revealed sequence differences of 30.4%, 15.9% and 31.0%, respectively (Table 5). The phylogenetic analysis of the aa sequence data showed that all three OTUs of H. macropi and Ma. ocydromi formed a distinct group, with maximum statistical support (posterior probability [pp] = 1.00), to the exclusion of the other strongyloid nematodes included here (i.e., Oesophagostomum dentatum, Chabertia ovina and Strongylus vulgaris) (Figure 2). In this analysis, OTU-C from M. r. robustus grouped together with Ma. ocydromi (pp = 1.00) to the exclusion of OTU-G (M. bicolor) and OTU-J (T. billardierii). While the three OTUs of H. macropi did form a group with C. ovina, statistical support for this grouping was moderate (pp = 0.74) (Figure 2).

Figure 2
figure 2

Phylogenetic analysis of the concatenated amino acid sequences for protein coding genes of Hypodontus macropi . Concatenated amino acid sequence data for all protein coding mitochondrial genes for three operational taxonomic units of H. macropi, from Macropus robustus robustus, Macropus bicolor, Thylogale billardierii, and from Macropicola ocydromi sequenced here as well as other concatenated sequence data representing complete mitochondrial genomes of Chabertia ovina, Oesophagostomum dentatum and Strongylus vulgaris (Strongyloidea) were analyzed using Bayesian Inference. The numbers above each tree branch represent the statistical support for each node (based on posterior probability score).

Table 4 Pairwise comparison of the amino acid sequences of the 12 protein coding mitochondrial genes
Table 5 Pairwise comparison of the amino acid sequences of the 12 protein coding mitochondrial genes

Sliding window analyses

Sliding window analyses were carried out to identify conserved and variable regions in mt genomes among all three OTUs. Results from these analyses are shown in Figure 3 for comparison across all the three OTUs and also for nucleotide diversities calculated from pairwise comparisons across the mt genomes of OTU-C, OTU-G and OTU-J. A nucleotide diversity pattern was broadly similar across the sliding window analyses and diversity within and between genes was relatively consistent across the genera. Greatest nucleotide diversity was detected within AT-rich (AT) region, followed by peaks of variation within NAD3, NAD4, NAD2 and NAD5. Gene-by-gene nucleotide diversity was highly variable, but, by far, the least variation was recorded within COX2. Overall, the full sliding window indicates a wealth of new genes capable of providing high levels of nucleotide variation for H. macropi population genetic studies.

Figure 3
figure 3

Sliding window analysis of complete mt genome sequences of Hypodontus macropi from different hosts. The three operational taxonomic units (OTUs) of H. macropi were from Macropus robustus robustus (OTU-C), Macropus bicolor (OTU-G) and Thylogale billardierii (OTU-J). The black line indicates nucleotide diversity comparing all the three OTUs; whereas three other symbols (a square, a circle and a plus sign) indicate pairwise comparisons across the mt genomes of OTU-C, OTU-G and OTU-J. Nucleotide diversity, measured iteratively every 10 bp over 300 bp windows of aligned mtDNA sequence data indicate peaks and troughs of sequence variability. Linearized maps of mtDNA are provided for each aligned data set, with the highest peaks of variability falling within the non-coding AT-rich regions.


Molecular tools offer an unprecedented opportunity to include new components in our studies on the discovery and description of parasite species. For example, molecular markers in the identification of parasite species can be used to (i) link morphologically indistinguishable life stages, such as larvae, to the adult stages of known species, (ii) elucidate life cycles by establishing the species that may serve as intermediate or paratenic hosts for larval stages of a parasite, and (iii) search for cryptic species [12]. However, Nadler and Pérez-Ponce de León [33] suggested that, in terms of explanatory power, single locus DNA barcodes and the barcoding-gap are insufficient approaches to delimit species, and concordance of independent information, including other genes, is required. In the present study, we sequenced the whole mt genomes of individuals of H. macropi, proposed to be distinct cryptic species based on MEE analysis and the ITS-2 sequences of specimens [811] from three different species of macropodid marsupial hosts. Based on the analysis of the complete mt genomes of three cryptic species, within the morphospecies H. macropi, the present study provides independent support that specimens of H. macropi from three species of macropodid hosts represent three genetically distinct species. The present mt genomic results support previous findings [811] from MEE and nuclear DNA-based analyses and indicate that H. macropi is represented by a complex of at least ten sibling species, one in each of M. agilis, M. bicolor, M. dorsalis, M. rufogriseus, P. persephone, T. billardierii and T. stigmatica, with the remaining three species occurring in M. r. robustus and M. rufus.

Although the ITS-2 rDNA has been shown to be a reliable genetic marker to distinguish among a variety of strongyloid nematodes [1315] and this marker has been useful for discovering sibling species within various cloacinine nematodes in macropodid marsupials [3438], hypotheses regarding such species need to be independently tested, preferably with sequence data from additional loci [33]. For this purpose, we chose mtDNA genes because of their relatively rapid rate of substitution, coupled to the smaller effective population size of mtDNA and the provision of evidence of lineage exclusivity in a shorter period of time (on average) following speciation [39, 40]. Using mt datasets, based upon pairwise comparisons of concatenated amino acid sequences inferred herein, we found a wide range of sequence differences of among different OTUs (5.8 – 18%) (see Table 4), and this magnitude is consistent with the previous findings of Chilton et al. [11] using the ITS-2 rDNA. These authors also found a wide variation in the ITS-2 sequences among three OTUs i.e., genotype (G) 8 (OTU-G, M. bicolor), G13 (OTU-C, M. r. robustus), and G36 (OTU-J, T. billardierii) of H. macropi. Based on the ITS-2 sequences, the highest sequence variation detected was between OTU-C [GenBank: HE866725] and OTU-G [GenBank: HE866720] (33.3%), followed by OTU-C [GenBank: HE866725] and OTU-J [GenBank: HE866748] (27.8%) and OTU-G [GenBank: HE866720] and OTU-J [GenBank: HE866748] (25.8%).

In nematodes, mt DNA is usually more variable in sequence within a species than the ITS-2 and other rDNA regions [14], indicating that mt gene regions are well suited for studying the population genetics of parasitic nematodes [14, 17, 41]. The sliding window analysis conducted herein displayed distinct patterns of nucleotide diversity among the three mt genomes representing OTUs-C, -G and -J of H. macropi (see Figure 3). In the present study, COX2 was revealed to be the gene with the lowest nucleotide diversity among these three OTUs, whereas NAD2, NAD3, NAD5 and NAD6 showed higher nucleotide diversity, which can be targeted to design primers for studying the population genetics of H. macropi. Given that cryptic species have been detected within the morphospecies H. macropi[811], low sequence variability is useful for the design of oligonucleotide primers that flank mt regions with high variability. Utilizing such primer sets, PCR-based SSCP analysis [42] could be applied to screen large numbers of individual specimens representing different host species and populations for cryptic species. Previous studies [10, 11] have shown the merit of SSCP for exploring the genetic variation in various populations of H. macropi, and this approach could be applied to large-scale studies of H. macropi specimens representing distinct OTUs. It is hoped that major advances in the next-generation sequencing technology (e.g., Illumina) [43] will allow selective whole genome sequencing of cryptic species of nematodes, thereby helping to re-classify them by employing detailed phylogenetic and comparative genetic analyses.


The present analyses of mt proteomic sequence datasets clearly supported the hypothesis that H. macropi from M. robustus robustus, M. bicolor and T. billardierii represent distinct species. This study emphasizes the importance and utility of the mt genomic datasets for parasites from macropodid marsupials as a basis for systematic, ecological and biological studies, and the specific diagnosis of infections. In particular, studying cryptic species within the morphospecies H. macropi provides a stimulus to re-explore the relationships of sibling species using complete mt genomic and proteomic sequence data sets. Although the present study focused on H. macropi, the approach employed here has major implications for the study of a wide range of invertebrates and vertebrates.


Collection of nematodes and isolation of genomic DNA

Adult H. macropi individuals were collected opportunistically from the intestines of M. r. robustus, M. bicolor and T. billardierii. The nematodes from these three host species were designated as operational taxonomic units (OTUs), namely OTU-C, OTU-G and OTU-J, respectively, according to Chilton et al. [8]. A specimen of a related species, Macropicola ocydromi, was also collected from Macropus fuliginosus and included for genetic comparison with OTUs of H. macropi. Individual worms were stored (in individual tubes) at −80°C until required. Upon thawing, individual nematodes were washed extensively in physiological saline, pH 7.4. The anterior and posterior ends of each nematode were excised, cleared in lactophenol for identification and then stored in 70% ethanol and retained as voucher specimens, which have been deposited in the South Australian Museum (SAM 46024, 46038 and 46053; see Chilton et al. [11]), Adelaide, Australia. The remainder of each worm was used for the extraction of genomic DNA using a small-scale sodium dodecyl-sulphate (SDS)/proteinase K digestion and column-purification (Wizard DNA Clean-Up Kit, Promega, USA) [42]. The identity of each specimen was verified by PCR-based amplification of the ITS-2 using an established method, followed by mini-column purification (Wizard PCR-Preps, Promega) of the amplicon and subsequent automated (bidirectional) sequencing (BigDye chemistry v.3.1) [42].

Sequencing and assembly of mt genomes

The complete mt genome of each worm was amplified by long-PCR (Advantage 2 kit; BD Biosciences) as two overlapping amplicons (“large” and “small”), using two (relatively) conserved primer sets, 5 F-40R and 39 F-42R, respectively [26, 44] and with appropriate positive and negative (i.e., no template) controls. The cycling conditions used were 95°C for 1 min (initial denaturation); then 95°C for 15 s (denaturation), 50°C (for primer set 39 F-42R) or 55°C (for primer set 5 F-40R) for 15 s (annealing), and 60°C for 3 min (for primer set 39 F-42R) or 68°C for 6 min (for primer set 5 F-40R) (extension) for 35 cycles, and a final extension at 60°C for 3 min (for primer set 39 F-42R) or 68°C for 6 min (for primer set 5 F-40R). Products were consistently produced from the positive control samples (Ascaris suum); in no case was a product detected for any of the negative controls. Products were detected in 1% (w/v) agarose gels after ethidium-bromide staining and ultraviolet transillumination, and were then purified over a mini-column (Wizard, Promega) and quantified spectrophotometrically (ND-1000 UV–VIS spectrophotometer v.3.2.1; NanoDrop Technologies). Following electrophoretic verification of quality, the two amplicons (~5 kb and ~10 kb; 2.5 μg of each), spanning the mt genome of each worm specimen, were pooled and subsequently sequenced using the 454 Genome Sequencer FLX (Roche) [45]. The consensus mt genome sequences [GenBank: KF361317-KF361320] were each assembled automatically, using the CAP3 program [46], from individual reads (of ~300 bp).

Bioinformatic pipeline for mt genomic annotation and analysis of sequence data

Following assembly, the genes and features of each mt genome from each worm were annotated using a semi-automated pipeline [21]. In brief, each protein coding mt gene was identified by local alignment comparison (performed in all six reading frames) using amino acid sequences predicted (employing invertebrate mitochondrial code) from the mt genome of a reference species (i.e., Oesophagostomum dentatum [GenBank: NC_013817]. The large and small ribosomal RNA subunit genes (rrnS and rrnL) were identified by local alignment (i.e., using nucleotide sequence data) using the same approach. All transfer RNA (tRNA) genes were detected and identified in a three-step process. Initially, all possible tRNA genes present in each consensus sequence were predicted (from both strands) based on a folding structure, using scalable models based on the standard nematode mt tRNAs [18]. All predicted tRNA genes were clustered into groups based on their anti-codon sequence and provisionally identified based on the amino acid encoded by this anti-codon. Two separate tRNA gene groups were predicted each for leucine (one each for the anticodons CUN and UUR, respectively) and for serine (one each for the anticodons AGN and UCN, respectively), as these tRNA genes have been shown to be duplicated in most invertebrate mt genomes, including those of nematodes [18]. All predicted tRNAs within each amino acid group were ranked based on structural “strength” (as inferred by the number of mismatched nt pairs in each stem), and the 100 best-scoring structures for each group were compared by BLASTn alignment against a database comprising all tRNA gene sequences for each amino acid of all published nematode mt genome sequences (available via; [47]). All tRNA genes of each mt genome were then identified and annotated based on having the highest sequence identity to known nematode tRNA genes. Annotated sequence data were imported into the program SEQUIN (available via for final verification of the mt genomic structure and subsequent, direct submission to the GenBank database.

Alignment and phylogenetic analysis of concatenated nucleotide or amino acid sequence data

The amino acid sequences were conceptually translated from the mt genomes of the three individuals of H. macropi and the specimen of Ma. ocydromi, and then concatenated separately. The amino acid sequences derived from published mt genomes from nematodes [Strongylida: Chabertia ovina [GenBank: NC_013831], Oesophagostomum dentatum [GenBank: NC_013817] and Strongylus vulgaris [GenBank: NC_013818] [see [21]] were also used for comparative purposes. These amino acid sequences (considering all homologous characters) were aligned using MUSCLE [48]. The amino acid sequences of H. macropi and Ma. ocydromi were then aligned using BioEdit [49] and subjected to phylogenetic analysis using Bayesian inference (BI), employing the Markov chain Monte Carlo (MCMC) method in MrBayes 3.1.2 [50, 51]. The likelihood parameters for BI were based on the Akaike Information Criteria (AIC) test in Prottest v2.4 [52]. The “best” model for the amino acid sequences was a fixed-rate Poisson model [53]. Estimates of the base frequencies, the substitution rate model matrix and the proportion of invariable sites were fixed. Posterior probabilities (pp) were calculated using 2,000,000 generations, employing four simultaneous tree-building chains, with every 100th tree being saved.

Sliding window analysis

This analysis was performed on the aligned, complete mt genome sequences of the three OTUs of H. macropi using DnaSP v.5 [54]. The alignment of these sequences was achieved using MUSCLE v.3.8 [47], as implemented in SeaView v.4 [55]. Keeping the nucleotides in frame, there were no ambiguously aligned regions. A sliding window of 300 bp (steps of 10 bp) was used to estimate nucleotide diversity (π) between the three OTUs of H. macropi; indels were excluded using DnaSP. Nucleotide diversity for the entire alignments was plotted against midpoint positions of each window, and gene boundaries were defined.