Protostrongylus rufescens is a metastrongyloid nematode of small ruminants, including sheep and goats (definitive hosts) in most parts of the world [1]. The dioecious adults of this nematode live in the respiratory system (terminal bronchioles and alveoli) of the definitive host. Here, the females produce eggs, from which first-stage larvae (L1s) hatch within the airways of the lung. L1s then migrate via the bronchial/tracheal escalator to the pharynx, are swallowed and are then excreted in the faeces. L1s infect a molluscan intermediate host (snail) and then develop, under favourable environmental conditions, into third-stage larvae (L3) [1]. L3s within an infected intermediate host are then ingested by the ruminant host, penetrate the gut wall and then migrate via the lymphatic system or blood stream to the lungs, where they develop to adult worms. The prepatent period is reported to be ~ 4–9 weeks [2]. Although P. rufescens infection is widespread, it does not usually cause major clinical disease. Nonetheless, pathological changes, characterized by chronic, eosinophilic, granulomatous pneumonia, can be detected upon post mortem examination. Adult worms reside mainly in the bronchioles and alveoli, and are surrounded by macrophages, giant cells, eosinophils and other inflammatory cells which produce grey or beige plaques (1–2 cm) under the pleura in the dorsal border of the caudal lung lobes [3].

Little is known about fundamental aspects of the epidemiology and ecology of P. rufescens. Molecular tools employing suitable genetic markers can underpin fundamental studies in these areas, with a perspective on investigating transmission patterns linked to particular genotypes of a parasite and on discovering population variants or cryptic species [4, 5]. Advances in nucleic acid sequencing and bioinformatics have provided a foundation for characterizing the mt genomes from parasitic nematodes as a source of genetic markers for such explorations. Here, we used an established, massively parallel sequencing-bioinformatics pipeline [6] for the characterization of the mt genome of P. rufescens, which we compared with those of related metastrongyloid nematodes, for which mt genomic sequence data are available. We also studied the genetic relationships among these lungworms and selected representatives within the order Strongylida, and suggest that selected regions in the genome of P. rufescens should serve well as markers for future studies of the ecology and epidemiology of this nematode around the world.


Parasite and genomic DNA isolation

Adult worms of P. rufescens were collected from the lungs of a fresh sheep cadaver in Victoria, Australia, washed extensively in physiological saline and then stored at −80°C. Upon thawing, genomic DNA was isolated from a single adult male specimen using an established method of sodium dodecyl-sulphate (SDS)/proteinase K digestion and subsequent mini-column purification [7]. The identity of the specimen was verified by PCR-based sequencing (BigDye chemistry v.3.1) of the second internal transcribed spacer (ITS-2) of nuclear ribosomal DNA [7].

Long-PCR, sequencing and mt genome assembly

From the genomic DNA extracted from the single male worm, the complete mt genome was amplified by long-PCR (BD Advantage 2, BD Biosciences) as two overlapping amplicons (~5 kb and ~10 kb), using the protocol described by Hu et al. [8], with appropriate positive (i.e., Haemonchus contortus DNA) and negative (i.e., no template) controls. Amplicons were consistently produced from the positive control samples; in no case was a product detected for the negative controls. Amplicons were then treated with shrimp alkaline phosphatase and exonuclease I [9], and quantified by spectrophotometry. Following agarose electrophoretic analysis, the two amplicons (2.5 μg of each) were pooled and subsequently sequenced using the 454 Genome Sequencer FLX (Roche) [10] according to an established protocol [6]. The mt genome sequence was assembled using the program CAP3 [11] from individual reads (of ~300 bp).

Annotation and analyses of sequence data

Following assembly, the mt genome of P. rufescens was annotated using the bioinformatic annotation pipeline developed by Jex et al. [6]. Briefly, the open reading frame (ORF) of each protein-coding mt gene was identified (six reading frames) by comparison to those of the mt genome of Angiostrongylus vasorum [GenBank: JX268542; [12]]. The large and small subunits of the mt ribosomal RNA genes (rrn S and rrn L, respectively) were identified by local alignment. The transfer RNA (tRNA) genes were predicted (from both strands) based on their structure, using scalable models based on the standard mt tRNAs for nematodes [5]. Predicted tRNA genes were then grouped according to their anti-codon sequence and identified based on the amino acid encoded by the anti-codon. Two separate tRNA gene groups were predicted each for leucine (Leu) (one each for the anticodons CUN and UUR, respectively) and for serine (Ser) (one each for the anticodons AGN and UCN, respectively), as these tRNA genes are duplicated in many invertebrate mt genomes, including those of nematodes [5]. All predicted tRNAs for each amino acid group were ranked according to the “strength” of their structure (inferred based on minimum nucleotide mismatches in each stem); for each group, the 100 best-scoring structures were compared by BLASTn against a database comprising all tRNA genes for each amino acid for all published mt genome sequences of nematodes (available via; [13]). The tRNA genes were then identified and annotated based on their highest sequence identity to known nematode tRNAs. Annotated sequence data were imported using the program SEQUIN (via, the mt genome structure verified and the final sequence submitted as an SQN file to the GenBank database.

Phylogenetic analysis of concatenated amino acid sequence datasets

The amino acid sequences were predicted from individual mt genes of P. rufescens and of other nematodes, including An. cantonensis, An. costaricensis, An. vasorum, Metastrongylus pudendotectus and M. salmi [GenBank: GQ398122, JX268542, GQ398121, GQ888714 and GQ888715, respectively; Metastrongyloidea]; Ancylostoma caninum and Necator americanus [GenBank: FJ483518 and NC_003416, respectively; Anyclostomatoidea]; H. contortus and Trichostrongylus axei [GenBank: NC_010383 and GQ888719, respectively; Trichostrongyloidea]; Oesophagostomum dentatum and Strongylus vulgaris [GenBank: GQ888716 and GQ888717, respectively; Strongyloidea]; and Strongyloides stercoralis [GenBank: AJ558163; Strongyloidoidea] [6, 12, 1420] (Table 1). All amino acid sequences were aligned using the program MUSCLE [21] and then subjected to phylogenetic analysis. For this analysis, best-fit models of evolution were selected using ProtTest 3.0 [22] employing the Akaike information criterion (AIC) [23]. Bayesian inference analysis was conducted using MrBayes 3.1.2 [24], with a fixed mtREV amino acid substitution model [25], using four rate categories approximating a Γ distribution, four chains and 200,000 generations, sampling every 100th generation. The first 200 generations were removed from the analysis as burn-in.

Table 1 Details of the whole mitochondrial genome sequences used in this study as reference sequences

Results and discussion

Features of the mt genome

The circular mt genome sequence of P. rufescens [GenBank: KF481953] is 13,619 bp in length (Figure 1). It contains two ribosomal genes, 12 protein-coding (cox 1-3, nad 1-6, nad 4L, atp 6 and cyt b) and 22 tRNA genes. The gene arrangement (GA2) in the mt genome of P. rufescens was the same as all other strongylid nematodes studied to date [5, 26]. All of the 36 genes are transcribed in the same direction (5′ to 3′) (Figure 1). Overall, the genome is AT-rich, as expected for strongylid nematodes [12, 20, 27, 28], with T being the most favoured nucleotide and C the least favoured. The nucleotide contents were 25.9% (A), 6.8% (C), 18.6% (G) and 48.6% (T) (Table 2). The longest non-coding (AT-rich) region, located between the genes trn A and trn P, was 223 bp in length (see Figure 1); its AT-content was 83.4%, significantly greater than for all other parts of the mt genome (Table 2).

Figure 1
figure 1

Schematic representation of the circular mt genome of Protostrongylus rufescens. Each transfer RNA gene is identified by a one-letter amino acid code in the map (external), and the AT rich region is also indicated. All genes are transcribed in the clockwise direction.

Table 2 Nucleotide composition (%) for the entire or regions of the mitochondrial genome of Protostrongylus rufescens

Ribosomal RNA genes

The rrn S and rrn L genes of P. rufescens were identified by sequence comparison with An. vasorum. The rrn S gene was located between trn E and trn S (UCN), and rrn L was between trn H and nad 3. The two genes were separated from one another by the protein-encoding genes nad 3, nad 5, nad 6 and nad 4L (Figure 1). The sizes of the rrn S and rrn L genes of P. rufescens were 683 bp and 959 bp, respectively. The lengths of these two genes were similar to those of other metastrongyloids for which mt genomes are known (694–699 bp for rrn S, and 958–961 bp for rrn L [12, 20, 2628] (Figure 1), and amongst the shortest for metazoan organisms [29].

Protein-coding genes and codon usage

The prediction of initiation and termination codons for the protein-coding genes of P. rufescens (Table 3) revealed that the commonest start codon was ATT (for five of 12 proteins), followed by TTG (four genes), ATA (two genes) and ATG (one gene). Ten mt protein genes of P. rufescens were predicted to have a TAA or TAG translation termination codon. The other two protein genes ended in an abbreviated stop codon, such as T or TA (Table 3).

Table 3 Summary of the mitochondrial genome of Protostrongylus rufescens

The codon usage for the 12 protein-encoding genes of P. rufescens was also compared with that of other metastrongyloid nematodes, Aelurostrongylus (Ae.) abstrusus, An. cantonensis, An. costaricensis and An. vasorum[12, 20, 28] (Table 4). All 64 codons were used. The preferred nucleotide usage at the third codon position of mt protein genes of P. rufescens reflects the overall nucleotide composition of the mt genome. At this position, T was the most frequently, and C the least frequently used. For P. rufescens, the codons ending in A had higher frequencies than the codons ending in G, which is similar to, for example, other members of the order Strongylida and Caenorhabditis elegans (Rhabditida), but distinct from Ascaris suum (Ascaridida) and Onchocerca volvulus (Spirurida) [1417, 30]. As the usage of synonymous codons is proposed to be preferred in gene regions of functional importance, codon bias appears to be linked to selection at silent sites and to translation efficiency [31, 32].

Table 4 Codon usages (%) in mitochondrial protein-encoding genes of Protostrongylus rufescens

The AT bias in the genome is also reflected in the amino acid composition of predicted proteins. The AT-rich codons represent the amino acids Phe, Ile, Met, Tyr, Asn the Lys, and GC-rich codons represent Pro, Ala, Arg the Gly. In the mt genome of P. rufescens, the most frequently used codons were TTT (Phe), TTA (Leu), ATT (Ile), TTG (Leu), TAT (Tyr), GGT (Gly), AAT (Asn) and GTT (Val). Six of these codons are AT-rich, and one of them is GC-rich. Seven of the eight codons contained an A or a T at two positions, except for GGT (Gly), which contained a T only in the third position. None of them had a C at any position. The least frequently used codons were CTC, CTG (Leu), GTC (Val), AGC (Ser), CCC (Pro), GCC (Ala), CAC (His), CGA (Arg), TCC (Ser), GGC (Gly) and ACC (Thr). All four GC-rich codons were represented here, and every codon had at least one C. When the frequencies of synonymous codons within the AT-rich group, such as Phe (TTT, 14.2%; TTC, 1.2%), Ile (ATT, 5.6%; ATC, 0.7%), Tyr (TAT, 5.6%; TAC, 0.9%) and Asn (AAT, 3.8%; AAC, 0.7%), were compared, the frequency was always less if the third position was a C.

Transfer RNA genes

Twenty-two tRNA gene sequences were predicted in the mt genome of P. rufescens. These sequences ranged from 52–63 nt in length. The tRNA structures had a 7 bp amino-acyl arm, a 4 bp DHU arm, a 5 bp anticodon stem, a 7 base anticodon loop, a T always preceding an anticodon as well as a purine always following an anticodon. Twenty of the 22 tRNA genes (i.e. excluding the two trn S genes) have a predicted secondary structure with a 4 bp DHU stem and a DHU loop of 4–10 bases, in which the variable TψC arm and loop are replaced by a “TV-replacement loop” of 4–11 bases, in accordance with most nematodes whose mt genomes have been characterised [5]. The mt trn S for P. rufescens has a secondary structure consisting of a DHU replacement loop of 7 bases, 3 bp TψC arm, TψC loop of 4–6 bases and a variable loop of 3 bases, consistent with other members of the Chromadorea [6, 14, 20, 33], but different from the enoplid nematodes Trichinella murrelli and T. spiralis[29, 34, 35]. Overlaps of one to four nucleotides are found between the genes trn H and rrn L, nad 4L and trn W, trn Y and nad 1, trn I and trn R within the mt genome of P. rufescens.

Amino acid sequence comparisons and genetic relationships of P. rufescens with metastrongyloid and other nematodes

The amino acid sequences predicted from individual protein-encoding mt genes of P. rufescens were compared with those of Ae. abstrusus, An. cantonensis, An. costaricensis, An. vasorum, Dictyocaulus viviparus and D. eckerti (Table 5). Pairwise comparisons of the concatenated sequences revealed identities of 37.0-92.4% between these species. Based on identity, COX1 was the most conserved protein, whereas NAD2 and NAD6 were the least conserved. Phylogenetic analysis of the concatenated amino acid sequence data for the 12 mt proteins showed that P. rufescens was more closely related to Ae. abstrusus, An. cantonensis, An. costaricensis and An. vasorum, (pp = 1.00) than to M. pudendotectus and M. salmi (metastrongloids) (pp = 1.00), to the exclusion of H. contortus, T. axei (trichostrongyloids), Anc. caninum, N. americanus (hookworms; ancylostomatoids) and O. dentatum and S. vulgaris (strongyloids) (Figure 2) (pp = 1.00).

Table 5 Pairwise comparison of the amino acid sequences of the 12 protein-encoding mitochondrial genes
Figure 2
figure 2

Phylogenetic relationship of Protostrongylus rufescens with other nematodes. Concatenated amino acid sequence data for all protein-encoding mitochondrial genes of Protostrongylus rufescens (bold) and other metastrongyloids, including Aelurostrongylus abstrusus, Angiostrongylus cantonensis, An. costaricensis, An. vasorum, Metastrongylus pudendotectus and M. salmi (metastrongyloids), as well as other concatenated sequence data representing different superfamilies, including Ancylostoma caninum and Necator americanus (hookworms; ancylostomatoids); Haemonchus contortus and Trichostrongylus axei (trichostrongyloids); Oesophagostomum dentatum and Strongylus vulgaris (strongyloids); and Strongyloides stercoralis (a rhabditid outgroup) were analyzed using Bayesian inference. The numbers above each tree branch represent the statistical support for each node (based on posterior probability [pp] score). GenBank accession numbers are in round brackets.


The characterisation of the mt genome of P. rufescens provides genetic markers for future population genetic and systematic studies. As sequence variation in ITS-2 nuclear rDNA is usually low within most species of strongylid nematodes [36], mt DNA is better suited for assessing population genetic variation. Therefore, PCR-based analytical approaches, using cox 1, nad 1 and nad 4 (displaying varying levels of within-species divergence), could be used to study haplotypic variation in P. rufescens populations in sheep and goats and also in molluscan hosts. Given that species complexes are commonly encountered in bursate nematodes [1, 4, 36], it would be interesting to prospect for cryptic species, to assess whether distinct genotypes/haplotypes of P. rufescens exist in sheep and goats as well as snails [37], and to establish whether particular sub-populations of P. rufescens occur in particular environments or geographical regions/countries, and have particular patterns of transmission.

It would also be interesting to assess the genetic structure of P. rufescens populations using PCR-coupled mutation scanning and sequencing of selected mt gene regions (such as cox 1 and nad 4), and mt DNA diversity within populations and the gene flow among populations. Findings for this lungworm (with an indirect life cycle via a molluscan intermediate host) could be compared with those for D. viviparus (with a direct life cycle), which has been reported to have surprisingly low mt DNA diversity within populations and limited gene flow among populations [38, 39].

The complete mt genome of P. rufescens provides a basis for extended comparative mt genomic/proteomic analyses of other protostrongyloids of ruminants, including P. brevispiculum, P. davtiani, P. hobmaieri, P. rushi, P. skrjabini, P. stilesi, Cystocaulus ocreatus, Neostrongylus lineatus, Muellerius capillaris (the latter of which is a particularly pathogenic parasite in goats), and those of other animal hosts, such as lagomorphs and pinnipeds. Given the utility of predicted mt proteomic datasets, high phylogenetic signal and consistently high nodal support values in recent systematic analyses [6, 12, 27, 28, 33] provide an opportunity to reassess the evolutionary relationships of lungworms (order Strongylida). For example, the family Protostrongylidae is distinguished from other metastrongyloids by only a couple of morphological characters, i.e., the gubernaculm and telamon in adult male worms [40], and it is proposed that protostrongyloids of lagomorphs originated from their ancestors primarily infecting sheep, goat, antelopes and deer [41]. Analyses of inferred mt proteomic data sets from a range of protostrongyloids should allow relationships within the family Protostrongylidae and also the origin of the protostrongylids of lagomorphs to be assessed. In addition, there has been considerable debate as to the relationships among suborders within the Strongylida, based on the use of phenotypic characters [42]. On one hand, it has been hypothesized that the suborder Metastrongylina (to which species of Protostrongylus, Metastrongylus, Aelurostrongylus and Angiostrongylus belong) originated from ancestors in the Strongylina [43, 44] or Trichostrongylina [45, 46]. On the other hand, it has been proposed that the Metastrongylina gave rise to the Strongylina [47]. To date, molecular phylogenetic analyses of nuclear ribosomal rDNA sequence data [48, 49] have suggested that the Trichostrongylina are basal to the Metastrongylina, which represented a monophyletic assemblage. However, Jex et al. [6], using mitochondrial sequence data, showed that the major suborders within the Strongylida (e.g., the Metastrongylina, Strongylina and Trichostrongylina) were each resolved as distinct, monophyletic clades with maximum statistical and nodal support (posterior probability = 1.00; bootstrap = 100). A detailed analysis using inferred mt proteomic data sets would allow an independent assessment of the systematic relationships of these suborders.


Comparative analyses of proteomic sequence datasets inferred from the mt genomes of P. rufescens and other lungworms indicate that P. rufescens is closely related to Ae. abstrusus, An. cantonensis, An. costaricensis and An. vasorum. The mt genome determined herein should provide a source of markers for future investigations of P. rufescens. Molecular tools, employing such mt markers, are likely to find applicability in studies of the population biology of this parasite and the systematics of lungworms.

Authors’ information

Abdul Jabbar and Namitha Mohandas shared first authorship.