The genome of Diuraphis noxia, a global aphid pest of small grains
The Russian wheat aphid, Diuraphis noxia Kurdjumov, is one of the most important pests of small grains throughout the temperate regions of the world. This phytotoxic aphid causes severe systemic damage symptoms in wheat, barley, and other small grains as a direct result of the salivary proteins it injects into the plant while feeding.
We sequenced and de novo assembled the genome of D. noxia Biotype 2, the strain most virulent to resistance genes in wheat. The assembled genomic scaffolds span 393 MB, equivalent to 93% of its 421 MB genome, and contains 19,097 genes. D. noxia has the most AT-rich insect genome sequenced to date (70.9%), with a bimodal CpG(O/E) distribution and a complete set of methylation related genes. The D. noxia genome displays a widespread, extensive reduction in the number of genes per ortholog group, including defensive, detoxification, chemosensory, and sugar transporter groups in comparison to the Acyrthosiphon pisum genome, including a 65% reduction in chemoreceptor genes. Thirty of 34 known D. noxia salivary genes were found in this assembly. These genes exhibited less homology with those salivary genes commonly expressed in insect saliva, such as glucose dehydrogenase and trehalase, yet greater conservation among genes that are expressed in D. noxia saliva but not detected in the saliva of other insects. Genes involved in insecticide activity and endosymbiont-derived genes were also found, as well as genes involved in virus transmission, although D. noxia is not a viral vector.
This genome is the second sequenced aphid genome, and the first of a phytotoxic insect. D. noxia’s reduced gene content of may reflect the influence of phytotoxic feeding in shaping the D. noxia genome, and in turn in broadening its host range. The presence of methylation-related genes, including cytosine methylation, is consistent with other parthenogenetic and polyphenic insects. The D. noxia genome will provide an important contrast to the A. pisum genome and advance functional and comparative genomics of insects and other organisms.
KeywordsDiuraphis noxia Russian wheat aphid Plant-insect interactions Phytotoxic Aphid Genome
- G + C
Guanine + Cytosine
- A + T
Adenine + Thymidine
Single nucleotide polymorphism
Fluorescence In-Situ Hybridization
Hidden Markov Model
Conserved Eukaryotic Gene
Short Interspersed Nuclear Element
Long Interspersed Nuclear Element
Long Terminal Repeat
Russian wheat aphid
Expressed sequence tag
National Center for Biotechnology Information
Odorant binding protein
Aphids rapidly radiated as parasites of flowering plants following the spread and diversification of angiosperms 80 to 150 million years ago [1,2]. From that point forward, aphids developed host-specific relationships through use of specialized piercing-sucking mouth parts that penetrate plant tissues to feed upon phloem sap. Key to this feeding process is the injection of saliva which modulates plant defenses [3,4]. More than 5,000 aphid species exist, and over 100 species are economically important crop pests . The Russian wheat aphid, Diuraphis noxia Kurdjumov, gained recognition as a global pest of wheat when it rapidly expanded its range from Central Asia and Europe  to most of the wheat producing continents over a 15 year period beginning in the early 1970s [7,8]. Losses in wheat exceeded $986 million over the first 10 years after this aphid invaded the United States in 1986 .
The genome of the pea aphid, Acyrthosiphon pisum, is currently the sole genomic model available for study of aphid biology, genetics, and aphid-plant interactions . A. pisum and D. noxia share many biological traits common to the family Aphididae. However, a phylogenetic analysis of Buchnera aphidicola sequences from a large sample of aphid species indicated that D. noxia diverged early in the evolution of the tribe Macrosiphini in the subfamily Aphidinae , which includes A. pisum, to develop unique host preferences and feeding relationships. The majority of aphids, including A. pisum, cause minor damage to their host plants by imposing a metabolic burden through constant removal of phloem sap [3,4,12,13]. In contrast, D. noxia represents an economically important group of aphids whose saliva induces rapid, direct, and systemic phytotoxic effects in the host plant, including chlorosis, loss of turgor, abnormal leaf growth, and necrosis [3,14]. A. pisum is a well known vector of plant viruses  and expanded its host range in legumes through the development of host races that are specific to a plant species [16,17]. D. noxia is not a vector of plant viruses , and feeds upon over 140 species in 40 genera of graminaceous plants including wheat and barley . This species demonstrates the ability to develop virulent strains, termed biotypes, in response to single gene-based resistance in wheat [20-22] which follows a virulence gene-resistance gene model often associated with plant-parasite relationships [23-25]. Currently, no additional D. noxia-resistant wheat cultivars have been released since 2003, when D. noxia Biotype 2 overcame Dn4 gene-based resistance in wheat. Although D. noxia is generally known to reproduce sexually, Biotype 2 is strictly parthenogenetic and a highly successful isofemale component of the genotypically diverse population in the United States .
We present this draft version of the D. noxia genome as the first crucial step in the study of phytotoxic aphid-plant interactions and the virulence genes that overcome resistance genes in wheat. The advancement of a phytotoxic aphid model will increase the understanding of how virulence genes and their products neutralize host plant resistance genes and the underlying mechanisms of the different aphid-host interactions. Further, the D. noxia genome provides an exceptional contrast to A. pisum that will facilitate functional and comparative genomics studies of aphids and advance the science of how insects adapted to perform their specialized roles in the environment.
Results and discussion
Quality-filtered and Buchnera-filtered sequencing data used to assemble theD. noxiabiotype 2 genome
Number of reads (x106)
Read Length (BP)
Fragment length (BP)
Total coverage (GBP)
2 x 101
Mated-Pair 2.5 kb
2 x 101
RWA MP 8 kb
2 x 101
2 x 101
D. noxiaDe novo genome assembly statistics
Number of Contigs
85,990 (≥200 bp)
Number of Scaffolds
Total Contig Length
Total Scaffold Length
Largest Contig (bp)
Largest Scaffold (bp)
29.06% GC/70.94% AT
32.8% GC/67.2% AT
CEGMA genes (complete/partial)
The D. noxia genome is composed of 29.1% G + C and 70.9% A + T which is the lowest G + C percentage of any currently-assembled insect genome including A. pisum (29.6% G + C) . The median G + C composition of all identified D. noxia transcripts, discussed below, is 39.3% with a range of 21.4% to 72.0%, compared with medians of 38.8% in A. pisum  and 38.6% in Apis mellifera . The high A + T compositions of D. noxia and A. pisum contradict the hypothesized positive correlation between insect genome size and A + T content .
The rate of single nucleotide polymorphisms within the D. noxia assembly was measured at 0.45%, and is most likely attributable to the heterozygous chromosomal state that is perpetuated by the strict parthenogenetic reproduction observed in D. noxia Biotype 2 . The experimental population consisted of the offspring of one female aphid, therefore, chromosomal heterozygosity was preserved in this clonal population. D. noxia’s SNP rate is similar to that of other insects [30,31], is beneath the ≤1% threshold of typical allelic variance , and confirms the existence of chromosomal heterozygosity in Biotype 2, as has been noted in other invasive clonal aphid lineages .
The telomeric sequence (TTAGGN) common to insects [10,33,34] was not found in D. noxia, supporting the findings of Novotna et al. , who were unable to detect common telomere sequences in this aphid by fluorescence in-situ hybridization (FISH) analysis. However, RNAseq read mapping revealed the expression of six telomere-related proteins present in the D. noxia genome (Additional file 1: Table S1), suggesting the existence of modified telomeric repeat sequences. The lack of classical telomeric sequences is not surprising as altered telomeric sequences, or the substitution of retrotransposons and satellite repeats, have been reported in several other unrelated insect species [33-36].
The completeness of the D. noxia genome was assessed using a hidden Markov model (HMM)-based search (CEGMA) of the genome scaffolds and assembled transcripts to identify individual members of the Conserved Eukaryotic Gene (CEG, n = 248) set, which are expected to be present in all eukaryotes . CEGMA analysis determined that the D. noxia genome assembly contains 94.4% of the total CEG set, including 214 complete and 20 partial CEGS, for a total of 234 identified CEGS. CEGMA analysis of the predicted D. noxia transcriptome found 247 complete CEGs, or 99.6% of the CEG set (Table 2). The identification of 94% of CEGs strongly supports our estimated genome assembly of 93% with gaps likely due to repetitive regions that are recalcitrant to assembly .
CpG dinucleotides and cytosine methylation
Cytosine methylation is the definitive mark of epigenetic regulation in eukaryotes, but occurs only in the CpG context in insects . While DNA methylation is present in most insects, it is only rarely observed among the holometabolous insect orders Coleoptera and Diptera, and is suspected to be undergoing evolutionary deletion in these orders [39,40]. Among hemipteran insects, A. pisum and Pediculus humanus each display evidence of cytosine methylation, but P. humanus lacks the de novo methyltransferase Dnmt3 . Epigenetic mechanisms are responsible for the regulation of polyphenism in insects [41,42] and the existence of these mechanisms is signified by a bimodal distribution of observed/expected CpG ratios (CpG(O/E)) [38,42,43]. Bimodally-distributed CpG(O/E) ratios indicate the existence of heavily- and lightly-methylated gene groups, with low and high CpG(O/E) ratios, respectively. Divergence of CpG(O/E) ratios in each gene group is due to depletion of CpG dinucleotides over time by the spontaneous deamination of methylcytosine and resulting conversion to thymidine, a process which occurs in all eukaryotes [42-45].
Transposable and repetitive elements
Transposable and repetitive elements are a major component of most insect genomes, although the proportion of the genome occupied by these elements varies by species. Transposable and repetitive element expansions lead to increases in genome size, and may be responsible for speciation events among isolated populations [46-48]. Likewise, reductions in genomic repetitive element proportions are observed in small genomes, potentially as a result of reductions of inefficient genomic elements while maintaining a functional gene complement [31,34,49].
Summary of transposable and repetitive elements in theD. noxiagenome
Number of elements
Percentage of genomeA
Percentage of genomeB
Total Interspersed Repeats
Gene and protein model prediction
Evidence-based andab initiogene and protein predictions
Gene modeling software
Ave./median protein length
Ave./median transcript length
Total number of amino acids
189 / 138
576 / 420
10,278 / 37
Ab Initio plus Evidence
439 / 320
1,694 / 1,251
29,663 / 66
345 / 241
Of the 19,097 predicted D. noxia genes and their corresponding protein models, 4,867 (25.4%) produced no BLASTP hits (E ≤ 1E−15) against the NCBI Insecta refseq dataset. Similarly, 4,898 D. noxia proteins (25.6%) were not mapped to orthologous sequences by Ortho-MCL. A BLASTN search (E ≤ 1E−15) of D. noxia transcripts vs. the NCBI Insecta refseq gene dataset (obtained 05/07/2014) determined that 4,867 (25.4%) D. noxia transcripts were unique to the species. RNAseq read mapping revealed that 2,624 (53.9%) of these unique genes were detectably expressed, while 2,243 unique genes were not (Additional file 5: Table S4). The observed percentage of distinct D. noxia genes is greater than that of any insect genome sequence published to date. Yet, a similar percentage of unique genes were observed in the Hessian fly Mayetiola destructor, a gall-forming dipteran wheat pest (personal communication, Stephen Richards). Curiously, both M. destructor and D. noxia alter wheat morphology and physiology, although through differing mechanisms, and this large percentage of unknown genes may reflect a highly evolved parasitic gene-for-gene relationship with their hosts [57,58].
Orthology between species
Orthology analysis of the 19,097 predicted D. noxia proteins was performed using ORTHO-MCL on the 150-species ORTHO-MCL database. We assigned 13,402 D. noxia proteins (70.2%) to 7,422 ortholog groups, including 5,416 single-copy orthologs, 7,986 multi-copy orthologs, and 797 proteins that matched unassigned orthologs, for a total of 14,199 ortholog group matches. The remaining 4,898 unmatched proteins were mostly hypothetical proteins (Additional file 6: Table S5 and Additional file 7: Table S6). The majority of the 14,199 proteins matched A. pisum proteins more closely (81.65%), followed by other arthropods P. humanus (3.52%), B. mori (2.46%), A. mellifera (2.20%), Ixodes scapularis (1.41%), Culex pipiens (1.25%), Aedes aegypti (1.11%), D. melanogaster (0.88%), and Anopheles gambiae (0.82%) (Additional file 8: Figure S2). Primary matches to 59 additional organisms made up only 4.70% of the total known orthology designations. Among unmatched proteins, 2,649 individual paralog pairs (Additional file 9: Table S7) were identified that grouped into 357 in-paralog families containing 1,337 proteins (Additional file 10: Table S8). The three largest in-paralog families contained 35 proteins each and the smallest (207 separate groups) held two proteins each. In-paralog families were identified through comparisons to 150 separate species to ensure the greatest level of discrimination and produce the most D. noxia-specific in-paralog group possible.
Direct examination of orthological relationships between each species (Figure 3B) determined that of the common 3,839 OGs, 401 OGs were present in 1:1:1 relationships and 145 OGs had N:N:N relationships in all examined species, allowing no gene losses within individual species. The remaining 3,293 OGs were present in either single or multiple copies in each species, and were classified as common orthologs. Ortholog groups with losses among species, including species-specific OGs, were classified as patchy orthologs which includes 752 ortholog groups unique to insects with varying numbers of members in each species, while 2,011 OGs (4,454 proteins) were present only in I. scapularis. The remaining proteins for each species were classified either as homologous proteins not yet placed into orthologous groups, or as unclassified proteins with no acceptable match in the orthology database. The pattern of orthology classification in D. noxia is similar to other insect species, yet with a larger percentage of unclassified genes [10,28,31,34,53,56,60,61]. By disallowing orthology group losses we present the most strict representation of orthologous relationships.
A. pisum is thought to have undergone extensive gene duplication during its evolution , which our LSE comparisons with D. noxia affirm. The general decrease in duplications per ortholog group, and the lower abundance of ortholog groups, in D. noxia versus A. pisum suggests that the D. noxia genome has been subject to relatively less alteration over the course of its evolution. D. noxia’s relative lack of gene duplications and expansions may indicate that D. noxia maintains and increases it host range by means other than genomic alteration or gene family expansion [47,48,62,63].
Aphid feeding requires a balance of specific salivary components to suppress or mitigate plant defenses throughout the stylet probing and feeding processes to allow sustained feeding on host plant phloem [64,65]. The invasive nature of plant feeding by aphids requires the expression of an array of salivary and metabolic genes that act upon the plant and protect the aphid from plant defensive proteins and xenobiotics [3,64-69]. D. noxia is unique among most aphids in that the saliva it injects while feeding produces phytotoxic symptoms that alter plant morphology and progressively damage the host to enrich phloem nutrition [14,69-71]. In accordance with the differences in host range between aphid species, feeding-related genes would certainly be subject to variation among and within species, therefore, salivary protein profiles are distinct to aphid species, biotypes, and host races [69,72-76].
We discovered 29 of 34 salivary genes previously detected in proteomic analyses of four D. noxia biotypes in this genome assembly . Five genes that were not detected were the D. noxia orthologs of GJ23220, IscW_ISCW012834, IP06594, Lava Lamp, and mitochondrial cytochrome c oxidase subunit I (COI). However, the mitochondrial COI gene was noted among the RNAseq-predicted transcripts, but was excluded from the genome assembly by the high-molecular weight DNA extraction method utilized. The remaining absent proteins may represent unassembled portions of the D. noxia genome, or may have sequences that are significantly altered outside of the original identified peptides .
A BLASTP examination comparing each predicted D. noxia salivary protein sequence to the NCBI Insecta refseq protein database revealed that each D. noxia salivary protein was more closely related to an A. pisum counterpart than to proteins from any other species, with E values ranging from 0.00 to 6.22E−74 and identities ranging from 100% to 58.21% (Additional file 15: Table S12 and Additional file 16: Table S13). The level of homology between D. noxia salivary protein sequences and their corresponding A. pisum orthologs varied inversely with the apparent abundance of each protein in the saliva . Common insect salivary proteins such as glucose dehydrogenase, trehalase, and apolipophorin were among the proteins with the least homology to their A. pisum orthologs. In contrast, those D. noxia salivary proteins that have not been observed in the saliva of other insects exhibited greater homology with orthologs from A. pisum and other insect species (Additional file 15: Table S12 and Additional file 16: Table S13) [69,73]. This finding implies that salivary gene expression, rather than sequence divergence, may play a role in D. noxia’s host specificity and phytotoxicity.
Glucose dehydrogenase and apolipophorin are among the most common and abundant proteins in aphid saliva [66,69,73,74]. Multiple glucose dehydrogenase proteins are present in aphid saliva, but their differing amino acid compositions suggest that each protein performs a different function within the plant host. Apolipophorin, present as a single gene copy in D. noxia, A. pisum, and most other insect species, was used to examine the phylogenetic relationship of D. noxia with other arthropods from the perspective of a conserved single-copy gene. A maximum-likelihood phylogenetic tree derived from a MUSCLE alignment of apolipophorin from eleven arthropod species confirmed known phylogenetic patterns, with basal branching of the aphid lineage from the holometabola and a more recent divergence of D. noxia and A. pisum (Additional file 17: Figure S4).
Defensive and detoxifying genes
Chemoreception genes are critical in perceiving taste and odor stimuli in order to locate appropriate food sources and establish feeding. Duplication or mutation of chemoreceptor genes can alter feeding behavior, and is implicated in insect speciation [48,62,63] and in establishing host range . The D. noxia genome contains 30 gustatory receptors (GR), 21 odorant receptors (OR), and 9 odorant binding proteins (OBP) (Additional file 16: Table S13), while A. pisum has 77 GRs, 79 ORs, and 15 OBPs  and Aphis gossypii, a generalist feeder, has 45 ORs, but an unreported GR and OBP number . Another Hemipteran insect, P. humanus, has only 10 ORs, 5 OBPs, and 8 GRs, a condition suspected to result from host range restriction . Omnivorous insect species also have a much higher number of chemoreceptors; the omnivorous T. castaneum possesses 265 ORs and 220 GRs , the housefly Musca domestica has 52 OBPs, 62 ORs, and 68 GRs , and the hymenopteran nectar-feeder A. mellifera has 170 ORs and 21 OBPs, but only 10 GRs . Comparison of OR numbers across insect species is complicated by the fact they include receptors to detect sexual pheromones that are essential to reproduction. Accordingly, high sequence variability was found between the ORs of D. noxia and A. pisum, ranging from 95% to 28% identity with the corresponding A. pisum OR sequence. Substantial sequence variation was also noted between A. gossypii and A. pisum ORs , indicating their potential role in host selection. The scarcity of D. noxia chemoreceptors in comparison with A. pisum and A. gossypii suggests that taste and odor perception may be less important in food source selection for D. noxia. Reductions in chemoreceptor numbers suggests that D. noxia relies upon phytotoxic salivary proteins to overcome host defenses and enhance the nutritional value of its hosts, thereby reducing its reliance upon chemoreceptors to identify suitable hosts and to broaden its host range [70,71].
Aphids consume a sugar-rich diet with a high osmotic potential, requiring only proteins such as uniporters that allow movement of phloem sugars with the membrane concentration gradient and into the hemolymph . The D. noxia genome contains a number of sugar transporters, including 84 Major facilitator genes compared with 200 in A. pisum and 13 inositol/glucose/sugar transporters versus 34 in pea aphid  (Additional file 16: Table S13). It is hypothesized that the relative increase in A. pisum sugar transporters in comparison to other sequenced insects reflects the adaptation to a sugar-rich diet . D. noxia has a lower number of sugar transporters D. noxia relative to A. pisum, revealing that sugar transporter gene expansion is not a universal condition in aphids and varies by hosts they utilize.
RNAi and epigenetic pathways
The RNA regulatory pathway, which includes the RNA interference (RNAi) and epigenetic regulatory pathways, functions in viral defense and gene regulation by degrading aberrant RNA and establishing and maintaining DNA and chromatin methylation. These mechanisms are not present in all insect lineages [41,42,82], and are notably lacking in D. melanogaster . Regulation of gene expression by DNA methylation is an essential aspect of polyphenism in aphids and other insects [41,42]. Likewise, D. noxia possesses the components of the common insect RNAi and epigenetic pathways [41,82-84]. Single copies of the genes SID1, AGO3, DCR-1, DCR-2, Drosha, Pasha, vacuolar H + −ATPase, Exportin-5, HEN1, Loquacious, and R2D2 were found, along with five PIWI, two PRMT-5, two AGO1, and two AGO2 genes (Additional file 16: Table S13). Genes required for epigenetic DNA and chromatin modifications were also present, including six Type 1 and 3 DNA methyltransferases, 16 histone-lysine methyltransferases, and 10 histone deacetylases (Additional file 16: Table S13). The presence of RNAi, DNA methylation, and chromatin methylation pathway components in D. noxia, in conjunction with the existence of a bimodal CpG(O/E) distribution ratio (Figure 1), confirms that D. noxia genes are subject to regulatory methylation similar to A. pisum and A. mellifera [38,43].
Insecticide resistance pathways
Most insecticides target specific protein motifs, and lose efficacy when mutations or alternate isoforms of the target protein prevail throughout a pest population. D. noxia is resistant to many insecticides in comparison to other insects , but is effectively controlled by systemically-applied pyrethroid, organophosphate, and organochlorine insecticides . The emergence of new D. noxia insecticide resistance has not been reported, but the aphids Myzus persicae, Aphis gossypii, and Schizaphis graminum have each developed resistance to several previously-effective insecticides [87-89].
D. noxia possesses common insecticide targets including an acetylcholinesterase-1 ortholog with S431 pirimicarb susceptibility, four additional acetylcholinesterases, 21 acetylcholine receptors, 12 sodium channel genes, and five GABA receptors, but neither neonicotinoid-detoxifying CYP450 (CYP2A6 and CYP6CY3)  (Additional file 15: Table S12). The absence of reported insecticide resistance in D. noxia is likely due to past reliance upon host resistance instead of insecticides. However, D. noxia displays significant chromosomal heterogeneity and rapid biotype development under the selection pressure of plant resistance genes, making it likely that genetically-based insecticide resistance can occur under high selection pressure. D. noxia’s smaller complement of detoxifying genes in comparison with other insects, exemplified by the absence of CYP2A6 and CYP6CY3, further suggests that such resistance will most likely occur as a result of a mutation-based sequence shift , rather than through amplified expression of a rare transcript , although both mechanisms are possible.
The majority of aphid-related plant damage is through plant virus transmission during feeding, and most grain aphid species are significant vectors of the barley yellow dwarf virus . D. noxia is exceptional in that it does not transmit plant pathogenic viruses . Nevertheless, the genome of D. noxia possesses a full complement of proteins thought to be involved in viral transfer, including 10 dynamins, 8 serine protease inhibitors, 8 vesicle transport/trafficking proteins, and 15 cyclophilins [10,15] (Additional file 16: Table S13). As viruses interact with specific epitopes of proteins involved in trans-membrane transport, it is likely that protein sequence differences between D. noxia and virus-transmitting aphids do not favor viral attachment. The inability of D. noxia to vector viruses requires further exploration.
Genes laterally transferred from bacteria
Aphids are obligate parasites that are able to feed upon nutritionally-deficient phloem sap through an endosymbiotic relationship with Buchnera aphidicola. These bacteria are housed within specialized bacteriocytes in the aphid gut lining and produce essential amino acids lacking in the host plant phloem . B. aphidicola displays limited sequence and gene copy number variance between D. noxia biotypes, and it is hypothesized that variance in total endosymbiont and plasmid copy number impacts aphid fitness [92,93]. The D. noxia genome holds genes that originated from the genome of B. aphidicola and that represent horizontal gene transfer from the B. aphidicola genome to the D. noxia genome. These include one LD carboxypeptidase and one rare lipoprotein receptor (RlpA) (Additional file 16: Table S13) as found in A. pisum [10,94,95], but not the acetylmuramidases noted in A. pisum . These genes were each located within long contigs (>5,000 bases in length) that included additional D. noxia genes not derived from the endosymbiont. As in A. pisum, there is no evidence of extensive horizontal gene transfer in the D. noxia genome . The DNA extraction and D. noxia pre-assembly read filtering method removed reads matching the B. aphidicola assembly originating from A. pisum, thereby eliminating the endosymbiont genome from our analysis, as supported by the absence of mitochondrial sequence in this assembly, and thus it is not addressed.
D. noxia’s genome shares many genes in common with the current model aphid, A. pisum, but varies in genome size and architecture, and specific functional genetic processes. The D. noxia genome, with its moderate transposable and repetitive element component and fewer total genes and gene families than are present in A. pisum , presents a case for a high degree of genomic conservation over time. The reduced repetitive element percentage in the D. noxia genome may factor in the lower number of gene family expansions relative to A. pisum , and is consistent with the hypothesis that insect evolution is driven by transposable element expansion and gene duplication [10,53,55,63]. The D. noxia genome also differs from that of A. pisum, primarily in genes governing host detection, acceptance, and feeding processes. This genome assembly describes D. noxia as a species uniquely adapted to feed upon graminaceous hosts using its salivary proteins to alter host morphology and metabolism [69-71], and provides an important contrast to non-phytotoxic aphids that depend on metabolically countering plant defensive compounds [3,66,67].
D. noxia possesses a low number of chemoreceptor genes compared to other insects [10,53,55,60,80] suggesting it has a low reliance on taste and odor perception as a survival criterion. It also has significantly fewer detoxifying and defensive genes in comparison with A. pisum and other insects [10,33,81], implying that D. noxia has evolved another way to circumvent host defenses. D. noxia’s relatively wide host range and rapid establishment into new geographical areas indicates that D. noxia’s genomic deficiencies in feeding-related genes in comparison to A. pisum are compensated for, and overcome by, phytotoxic salivary proteins that drive phloem nutrition enrichment and alter host morphology [14,69-71]. Aphids causing phytotoxic reactions in plants are uncommon, thus D. noxia is an exception to the typical view of insect-plant coevolution, in which aphid evolution is thought to be driven by the necessity to avoid or detoxify newly-evolving plant defensive responses in order to feed without damaging the host [96,97]. D. noxia presents a more rapacious character, surviving by inducing phytotoxic symptoms which damage and eventually destroy its host.
Our assembly presents a phytotoxic aphid model as an alternative genomic model for aphids and represents the second sequenced aphid genome. The contrasting and divergent evolutionary paths of D. noxia and A. pisum, and their contrasting aphid-host relationships, provide an extraordinary opportunity to better address the genetic basis of the feeding processes of aphids and their ability to evade plant defenses, to understand the nature of interactions between aphid virulence genes and plant resistance genes, and to formulate comparative and functional genomics studies that will ultimately lead to increased knowledge of aphid biology and evolution.
DNA and RNA collection, sequencing, and assembly
Chromosomal DNA was collected using the Agilent DNA extraction kit from a pooled sample of 200 Diuraphis noxia Biotype 2 adult females isolated from a single clone-derived colony obtained from the USDA-ARS Cereal Insects Genetic Resource Library (CIGRL, Stillwater, OK) reared on wheat cv. TAM110. Total RNA was also recovered from 200 pooled RWA2 adult females from the same source, and extracted using the Promega SV Total RNA Isolation system. Recovered DNA and RNA was frozen at −80°C immediately and used in subsequent sequencing analyses. The recovered DNA was sheared into paired-end and mated-pair libraries (Corvaris S2, Paired-end: peak power 50.0, duty factor 10.0, cycle per burst 200, time per run 90 s; Mated-pair: duty cycles 20%, intensity 0.1, cycle per burst 100, time per run 5 min), and purified (Paired-end: Dynal magnetic M 280-streptavidin beads, Mated-pair: Agencourt AMPure XP beads). Paired-end reads were then end-repaired, A-tailed, and ligated to adapters, then amplified by PCR (98°C for 30s, 18 cycles of: 98°C 10s, 65°C 30 s, 72°C 30s, with a final step of 72°C 15 m and 4°C until retrieved). Agencourt AMPure XP beads were used for purification following PCR. Sequencing was performed with an Illumina Hiseq 2000 with TruSeq v3.0 chemistry. Paired-end fragments, prepared by the U.S. National Institutes of Health/National Cancer Institute, averaged 223 bases with a read length of 2×101 bases. A mated-pair library prepared by the NIH/NCI averaged 2.6 kb in length, also with a read length of 2x101 bases. An additional mated-pair library was created by Axeq Technologies, Inc. (Rockville, MD) averaging 8.7 kb, with a read length of 2×101 bases. All reads were quality filtered on the basis of each read containing a minimum of 90% of bases in each read having a minimum quality score of Q20. Reads were additionally filtered before assembly by removing those reads mapping to the A. pisum endosymbiont Buchnera aphidicola genome. The quality- and Buchnera-filtered reads were then used as input for the genome assembly program AllPaths-LG [98,99], which was used to conduct a de novo assembly of the RWA2 genome using default settings, with inward-oriented paired-end libraries and outward-oriented jumping libraries, and with ploidy set to 2 (diploid).
RNA-seq was performed by NIH/NCI, 1 μg of RWA2 RNA per lane was processed according to the Illumina Truseq RNA Low-sample preparation protocol and sequenced using paired-end reads (2×101) on an Illumina Hiseq 2000 using Truseq v 3.0 chemistry. Reads were quality-filtered prior to assembly to include only sequences with a Q20 value in greater than 90% of bases, and these reads were used to perform a de novo transcriptome assembly using the TRINITY (r2012_10_05) software package using default settings (Broad Institute, Boston, MA) . The assembled sequences were used downstream for evidence during genome annotation, and RNAseq reads were mapped to predicted transcripts using CLC genomics workbench v. 7.5.
Transposable and repetitive element analysis
The RWA genome scaffolds were used to determine the repeat content of the RWA2 genome by analysis with RepeatMasker 4.0.3 . The RWA scaffold file was analyzed using first RepeatModeler  to identify RWA-specific repeats. Masked sequences were then analyzed with RepeatMasker, run with the RepBase full repeat database (Repbase18.07) as an evidence file, to identify all repeats and transposable elements within the D. noxia genome.
Structural prediction and genome annotation
Structural genome annotation was performed by utilizing RWA2 genomic scaffolds as input for the MAKER  genome annotation pipeline. RepeatMasker was used to mask low-complexity regions and repetitive DNA using the custom database created during repeat masking . The following evidence files were used to aid in annotation: EST/RNA sequence evidence was provided by RWA2 Trinity-assembled RNA seq data, repetitive sequences were provided by the combined D. noxia/RepBase repeat database and protein data was provided by the A. pisum refseq protein dataset (NCBI refseq, downloaded 03/15/14). Augustus  was used within the MAKER framework to develop ab initio protein and transcript predictions. PFAM analysis was conducted using an HMM-based search (CLC Genomics version 7.0) of all MAKER-derived protein models using the full PFAM database (version 22.0). Transcripts and proteins predicted by MAKER were subjected to BLASTN and BLASTP comparisons using the CLC Genomics workbench (v. 7.0).
Ortho-MCL  was used to determine the orthology of the 19,097 MAKER-identified RWA2 proteins and the NCBI protein refseq databases for D. melanogaster (14,067), A. pisum (24,378), A. mellifera (21,780), P. humanus (11,336), A. gambiae (14,341), B. mori (15,068), and I. scapularis (20,467) as comparison species. Orthologous groups were determined utilizing the Ortho-MCL web service (orthomcl.org). First, an all-vs-all BLASTP of each species-specific database was performed against the full OrthoMCL database (150 species, accessed 07/15/2014), followed by determination of orthologs, paralog pairs, and in-paralog groups. Results from each of these analyses were compared directly to discover multiple- and single-copy orthologs between species. In order to compare single-copy orthologs between species, 37 single-copy orthologs specific to this arthropod group, and absent from any other organism, were retrieved from the ORTHO-MCL database and aligned using MUSCLE . The resulting alignments were concatenated by CLC genomics workbench (v. 7.0). Concatenated alignments were used to construct a maximum-likelihood phylogeny by neighbor-joining analysis over 1,000 replicates, also using the CLC genomics workbench (v. 7.0). Additional phylogenetic analyses were conducted using MUSCLE or CLUSTAL-W alignments to produce maximum-likelihood phylogenies by neighbor-joining analysis with the CLC genomics workbench (v. 7.0)
Nucleotide and dinucleotide content of the genome and predicted transcripts was conducted using Sequool software package. Percentages of each nucleotide per scaffold or transcript were analyzed, as were the percentage of CpG dinucleotides. CpG dinucleotide observed/expected ratio was performed for each transcript using the formula CpG(O/E) = CpG frequency/(C frequency × G frequency) .
The Whole Genome shotgun project was deposited with the National Center for Biotechnological Information (NCBI) under accession number JOTR00000000, Bioproject PRJNA233413. Raw Illumina DNA reads were submitted to the NCBI SRA database under the Biosample number SAMN02693874, RNAseq reads were submitted under biosample number SAMN03435929. Illumina reads may be accessed under SRA study SRP040557.
We thank Dr. Dana Brunson and Jesse Schafer of the Oklahoma State University High-Performance Computing Center for providing computing hardware and technical expertise.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.