Background

Soybean aphid (Aphis glycines Matsumura) is a phloem-feeding insect of soybean [Glycine max (L.) Merr] that is native to East Asia [1]. It was first found in soybean fields in northern United States in 2000 [2], and the aphid rapidly spread over the main soybean growing areas in both the United States and Canada [3]. It is a serious invasive pest of soybean in North America as well as a vector of several viruses [4]. In temperate regions, soybean aphids are heteroecious and holocyclic, where buckthorn (Rhamnus spp.) is the primary host plant and soybean is a secondary host [5]. Soybean aphids sexually reproduce on buckthorn, survive as eggs on the underside of buckthorn leaf buds during the winter, and spread to soybean or its wild relatives starting in late spring. Under favorable conditions, once soybean aphids colonize on susceptible plants in early summer, their populations can grow at an exponential rate and build to thousands of aphids per soybean plant [6,7,8].

Spraying insecticides is the primary method of controlling soybean aphid in infested soybean fields, though it involves significant costs and environmental concerns. There was a 130-fold increase in the use of insecticides in the northcentral United States from 2000 to 2006, mainly to control soybean aphids [9]. Intensive use of insecticides may lead to insecticide resistant aphids [10, 11]. Also, beneficial insect populations can be negatively affected due to intensive uses of chemicals [12, 13].

Genetic host resistance is an alternative, cost-effective and environmentally sustainable strategy to protect soybean crops from the aphids [14]. A number of soybean aphid resistant germplasms were identified through large scale screenings against different biotypes of soybean aphids [6, 15]. Several resistance (Rag) genes and quantitative trait loci (QTL) with resistance to A. glycines have been mapped and aphid resistant cultivars adapted to the northcentral United States have been released [16,17,18,19,20,21,22]. A major gene was mapped on chromosome 13 in PI 567301B at a genomic position near the Rag2 locus in PI 243540 and provisionally named Rag5 [22]. This gene explained over 80% of the phenotypic variance of resistance to A. glycines in PI 567301B [22]. Although the two genes were mapped in close proximity, the mechanisms of resistance of the two genes were different. Detached leaves of PI 567301B lost resistance to soybean aphids, while those of PI 243540 maintained resistance [23]. Rag5 confers antixenosis resistance to soybean aphids, while Rag2 resistance is predominantly antibiosis [22].

The influence of phloem feeding insects, such as aphids, has been well studied in many plant species, including Arabidopsis, maize, tomato, celery, and sorghum [24,25,26,27,28]. Soybean-A. glycines interaction was also studied by employing genotypes conferring Rag1 or Rag2 resistance [29,30,31]. Transcriptomic studies have revealed continuity in the molecular responses to aphids across the plant species, where the expression of genes associated with cell wall development, defense, transcription factors, DNA/RNA processing, secondary metabolism, and hormone-related signaling were altered [32].

RNA sequencing enables profiling of genome-wide changes in gene expression over times under the conditions of interest. This study aimed to investigate global changes in gene expression of homozygous resistant (Rag5/Rag5) and homozygous susceptible (rag5/rag5) NILs of soybean infested by soybean aphid biotype 2. Results of the study provide insights into the interactions between soybean and A. glycines and identify transcriptomic changes in soybean that may be associated with resistance to A. glycines.

Results and discussion

High throughput sequencing and validation of selected DEGs by quantitative RT-PCR

A total of 24 libraries were sequenced on three lanes of the HiSeq2000. Approximately 386 million of the preprocessed reads were aligned to the latest soybean reference genome Glyma.Wm82.a2.v1, with overall mapping rate of 90% of unique reads (mapped/filtered) (Table 1). Variations by lane were adjusted via estimated size factors for data matrix using DESeq2 [33]. After the adjustment, the replicates were generally clustered based on principal component analysis (Additional file 1: Fig. S1). Of the 56,054 soybean annotated reference genes, a total of 35,284 genes were detected as ‘expressed’ with a minimum of 10 reads across the 24 samples. Some housekeeping or reference genes [34, 35] were confirmed with no differences in their expression between the R- and S-NILs throughout the four time points, including Actin (Glyma.08G182200), Cons4 (Glyma.12G020500), Cons6 (Glyma.12G051100), and Tublin (Glyma.08G014200) (Additional file 2: Table S1).

Table 1 Numbers of reads sequenced and mapped per sample

To test the robustness of the RNA sequencing data, the expression levels of eight DEGs were validated by quantitative RT-PCR (qRT-PCR). The validation was carried out using independent samples not previously tested by RNA sequencing. Further, five of these eight genes are annotated to have resistance-related functions, and three genes (Glyma.13G190200, Glyma.13G190300, and Glyma.13G190400) are located near the Rag5 locus (Additional file 3: Table S2). Figure 1a shows relative expression level – log2fold change (R-NIL/S-NIL) – of each gene in RNA sequencing and in qRT-PCR at each time point. In most cases, the direction of differential expression in both techniques was consistent, though there were some differences in magnitude of expression between the techniques. Nonetheless, these genes showed good agreement (R 2 = 0.74) between the RNA sequencing and qRT-PCR results overall (Fig. 1b), indicating good reproducibility of the experiment.

Fig. 1
figure 1

a Validation of results by RNA sequencing using quantitative RT-PCR. a, Relative expression levels were measured by qRT-PCR and compared to those of RNA sequencing for 8 selective genes. b Correlation in log2fold change (R/S) between RNA sequencing and qRT-PCR

Substantial transcriptomic changes between pre- and post-infestation in R- or S-NILs

The post-infestation (6, 12, and 48 hai) gene expression data of the R- and S-NIL were compared with the pre-infestation control (time 0) of the respective NIL. A total of 2481 and 1364 genes were up- or down-regulated in the R- and S-NIL, respectively, in response to infestation by the aphids (Fig. 2a, b). In the R-NIL, 1899 genes were altered within 6 hai, with the numbers of DEGs gradually decreasing to 1391 by 48 hai (Fig. 2a). In the S-NIL, on the other hand, changes in gene expression were less extensive and confined primarily to 12 hai (Fig. 2b). A total of 1148 DEGs were identified at 12 hai, whereas only 453 and 99 DEGs were identified at 6 and 48 hai, respectively (Fig. 2b). The number of DEGs consistent over two-day period (6, 12, and 48 hai) largely differed between the R- and S-NIL; 915 (434 up-and 81 down-regulated) and 20 (all up-regulated) DEGs in the R- and S-NILs, respectively. Sixteen DEGs were common between the 915 DEGs of R-NIL and the 20 DEGs of S-NIL. Regardless of NIL, such DEGs included genes associated with phytohormone signaling and defense response to plant (Additional files 4 and 5: Tables S3 and S4). This suggests the mechanisms of aphid infestation response was, in part, shared between the NILs but more pronounced in the R-NIL.

Fig. 2
figure 2

Counts of differentially expressed genes (DEGs) in the comparison between pre- and post-infestation in the R-NIL a and S-NIL b filtered with false discovery rate (FDR) < 0.01, |log2fold change (post−/pre-infestation)| > 2, and average read count >30

At 6 and 12 hai, 2239 and 1301 DEGs were identified in the respective R- and S-NIL (Fig. 2), of which 1044 DEGs were common in both NILs and 2496 DEGs were specific to R- or S-NIL. Enrichment of gene ontology (GO) terms was analyzed separately by the group of the DEGs, and significant GO terms identified at false discovery rate (FDR) < 0.05 are listed in Table 2. A total of 24 significant GO terms were identified in the three groups. Functional analysis of GO terms indicates that soybean responses to aphid infestation agree with current understanding in mechanisms of plant defense in various hosts – aphid systems [26, 29, 36, 37]. Carbohydrate metabolic process (GO:0005975) and oxidation reduction (GO:0055114) were most highly enriched in the common DEGs significant in both compatible and incompatible interaction of the present study (Table 2). An evolutionary arm race, called the Zig-zag model [38], was proposed to illustrate the molecular interaction between plant hosts and parasites. According to the model, plants detect pathogen-associated molecular patterns (PAMPs), such as chitin and flagellin, and trigger PAMP-associated immunity (PTI). Successful pathogens interfere PTI by secreting effectors to manipulate host cell process. Effectors may trigger susceptibility in plants. Once effectors are recognized by nucleotide binding site-leucine-rich repeat (NBS-LRR) protein, effector-triggered immunity (ETI) is activated. Aphid stylets and exoskeleton are composed of chitin, same as fungal PAMPs [39]. Although it is yet unclear if the Zig-zag model [38] can be extended to aphids or other insects, plants may recognize aphids’ stylets or components in aphid saliva and turn on PAMP-triggered immunity (PTI). PTI can initiate chemical defense, such as generation of reactive oxygen species (ROS), and mechanical defense by cell wall modification and deposition of callose [25, 40]. ROS is involved in defensive signaling in plant tissues, and plants also induce detoxifying proteins that detoxify these oxidative stress [32]. Genes involved in hydrogen peroxide biosynthetic process such as Glyma.09G038500, Glyma.06G225500, and Glyma.04G153700 were found highly upregulated (>8 times) in the R-NIL after aphid infestation (Additional file 4: Table S3), while upregulation was delayed and limited at 12 hai in S-NIL. Related to detoxification, oxidoreductase activity (e.g. GO:0016701, GO:0016491, GO:0016703, GO:0016903) was also found statistically significant in both R- and S-NILs. Genes associated with fatty acid biosynthetic process (GO:0006633) and fatty acid metabolic process (GO:0006631), lipid metabolic process (GO:0006629) were enriched in R-specific response. This may be related because plants are known to consume polyunsaturated fatty acids (linoleic and linolenic acid) for jasmonic acid (JA) biosynthesis [31]. Activated JA signaling was effective to significantly reduce aphids in Arabidopsis [41]. Some genes related to JA biosynthetic process such as Glyma.07G034800, Glyma.15G026400, Glyma.14G103100, and Glyma.19G008700 were highly upregulated (averaging over 5-folds) in R-NIL after aphid infestation (Additional file 4: Table S3). Hormone-related signaling triggered by aphid infestation including JA, salicylic acid, and ethylene, plays critical roles in regulating plant defenses [32].

Table 2 Enriched gene ontology terms identified from the 2496 differentially expressed genes in the R- and/or S-NILs during 12 h after aphid infestation

At 48 hai, 1391 DEGs were identified in the R-NIL, while only 99 DEGs were in the S-NIL (Fig. 2a, b). Of those in the R-NIL, 242 genes were unique at 48 hai (Fig. 2a). These DEGs generally differed in their functions from those responded earlier at 6 and 12 hai. The frequencies of DEGs involved in DNA replication (GO:0006260, FDR = 2.7e-18), DNA metabolism (GO:0006259, FDR = 4.4e-10), and ligase activity (GO:0051002, FDR = 4.6e-5) were highly significant in enriched GO term analysis (Additional file 6: Table S5). This was different from the S-NIL because such GO terms were not significant or remarkable from 63 DEGs unique at 48 hai in the S-NIL (Additional file 7: Table S6). This may be related to aphid resistance in R-NIL, because the R-NIL may immediately begin to recover damaged cells or regenerate new cells after the initial interaction with aphids. In contrast, these processes may not occur immediately or be delayed in the susceptible line.

Comparison of mRNA abundance between R- and S-NILs following aphid infestation

While comparison between pre- and post-aphid infestation explained general aspects of transcriptomic reprograming in response to aphid infestation, changes in abundance of transcripts between the R- and S-NILs for different time points would provide a better understanding of regulation of Rag5-conferring soybean aphid resistance in PI 567301B. Relaxed stringency criteria – FDR < 0.1, |log2fold change| > 1, and average read >10 – were adopted for this comparison between R- and S-NILs since the same stringency as pre- vs. post-infestation resulted in a very small number of the DEGs. A total of 556 DEGs were identified between the R- and S-NILs, which were statistically significant at one or more time points (Additional file 8: Table S7). The resistant and susceptible genotypes are backcross (BC5) lines, and thus highly (>98%) genetically identical to the recurrent parent (i.e. Wyandot). Thus, while the number of DEGs between NILs is small, they are more likely to reflect biological significance rather than genetic background. Dynamic changes in the gene expression level between the two genotypes were observed during the initial 12 h, and the total number of DEGs was considerably smaller at 48 hai (Fig. 3). This result suggests that the initial 12 h is a critical period in soybean-aphid interaction, supporting the earlier reports that approximately 8–9 hai was the key time period of soybean-aphid interaction [29]. In the present study, transcripts of 76 genes were consistently and significantly up-regulated in one NIL than the other for at least 12 hai. These genes are listed in Additional file 9: Table S8. Not surprisingly, the annotated functions of these genes were mostly similar with the defense mechanisms found in the comparison of pre- vs. post-infestation. GO term enrichment analysis of the 76 DEGs identified only significant GO term for ADP binding (GO:0043531). This GO term is far too generalized to make any inference regarding underlying mechanism. Corresponding DEGs are related to defense response, signal transduction, and SA biosynthetic process (Additional file 9: Table S8).

Fig. 3
figure 3

a Counts of differentially expressed genes (DEGs) identified in the comparison between R- and S-NILs at pre-infestation, 6, 12, and 48 h after infestation, filtered with false discovery rate (FDR) < 0.1, |log2fold change (R/S)| > 1, and average read count >10. R-NIL indicates the number of DEGs upregulated in R-NIL, while S-NIL does those upregulated in S-NIL at each time point. b Venn diagram displaying the detailed distribution of DEGs

Aphid behaviors such as probing and feeding can be affected by biochemical components such as pH, sucrose, and amino acid contents in phloem sap [40, 42]. This can be one reason of preference leading to antixenosis type of resistance provided by the Rag5 gene. This interaction would be likely determined within a short period of time after initial contact. Thus, a chemical component would more likely be constitutive in resistant genotype. Similarly, in transcript levels, constitutively expressed genes were thought to be responsible for many defense-related responses or differential priming of resistance in resistant plants, including wheat [43]. Studham and MacIntosh [31] also discussed that constitutive resistance might be, in part, responsible for the response observed in R NILs under aphid-absent condition. In the present study, 82 DEGs were upregulated in the R-NIL compared to the S-NIL in the absence of aphids, and 14 of these genes are annotated as leucine-rich repeat (LRR)-containing protein or LRR receptor-like protein kinase. These proteins are related to defense response [44] and may confer the ability to rapidly recognize aphid infestation and trigger initial responses. Many secondary metabolites, such as phenylpropanoids and terpenoids, have been implicated in plant defense against phloem-feeding insects [32], yet unknown if these DEGs are connected to metabolomic differences in the NILs. Elucidating this relationship could provide further understanding of antixenosis.

Thirteen candidate genes located in the Rag5 locus of chromosome 13

After initial genetic mapping of the Rag5 gene in the Wyandot × PI 567301B population, backcross populations developed using Wyandot as the recurrent parent were used to fine map the Rag5 locus. The locus was placed within a 120-kb interval (Gm13:30.33 ~ 30.35 Mb) on chromosome 13 (unpublished data). The published soybean genome was sequenced using the aphid-susceptible variety Williams 82 [45] that does not have the Rag5 allele. Williams 82 has 13 annotated genes in the 120-kb interval containing the Rag5 locus (Table 3). There are three copies of genes with the annotated function ‘protease family S26 mitochondrial inner membrane protease’ (Table 3). One of them, Glyma.13G190200, showed a significant difference in gene expression (FDR < 0.001) and was expressed approximately 4-fold higher in S-NIL for all time points (Table 3). Another ‘protease family S26 mitochondrial inner membrane protease’ gene (Glyma.13G190500) was also significant at 6 (FDR < 0.05) and 12 hai (FDR < 0.001) with almost 4-fold higher expression in R-NIL. It is unknown why the gene encoding ‘protease family S26 mitochondrial inner membrane protease’ is related to aphid resistance and even why the two genes with the same annotation responded differentially during the two-day period. A gene of unknown function (Glyma.13G190600) was also detected with four-fold higher mRNA expression in R-NIL than in S-NIL, which was consistent from 0 to 12 hai (Table 3).

Table 3 Thirteen genes annotated in the Rag5 candidate region of chromosome 13 based on Glyma.Wm82.a2.v1 and comparison of their transcript abundance between R- and S-NIL

NBS-LRR genes are well known as a major group of genes conferring resistance against virulent biological agents in many crops [44, 46]. The first cloned root knot nematode resistance gene in tomato, Mi-1.2, is an NBS-LRR type of plant resistance gene [47]. Aphid-resistance genes identified in melon and Medicago truncatula were mapped to a NBS-LRR gene cluster [48, 49]. In the soybean – A. glycines interaction, microarray contrasting Rag1 genotype (cv. Dowling) and susceptible genotype (Williams 82) also identified an NBS-LRR type of gene with constitutively higher expression during 3 days after aphid feeding, which was proposed as a strong candidate gene for Rag1 [29]. Recently, transcriptomic profiling using the NILs carrying either Rag2 or rag2 allele from PI 243540 revealed that a NBS-LRR coding gene, Glyma13g25970 (Glyma.13 g190400 in Glyma.Wm82.a2.v1), located in the Rag2 (PI 243540) interval (unpublished) was highly upregulated in the resistant Rag2 (PI 243540)-NIL [30]. Also, two non-NBS-LRR coding genes, Glyma13g25990 (no correspondence in Glyma.Wm82.a2.v1) and Glyma13g26010 (Glyma.13 g190200 in Glyma.Wm82.a2.v1), in the Rag2 (PI 200538) interval were found significantly upregulated in the resistant Rag2-NIL [30]. It was reported that PI 243540 and PI 567301B have distinct modes of resistance against biotype 2 aphid [23]. Different genes may be involved in the two modes of resistance and thus, regulation of Glyma.13G190200 is not necessarily same in the two modes of resistance.

The fine mapped genomic interval of Rag5 partly overlapped with the (~90 kb) window of Rag2 (PI 243540) (unpublished data) and with the candidate genomic region (54 kb) of Rag2 (PI 200538) [50]. This region is a well-known R-gene clustered region. Among the thirteen genes, four were predicted to encode LRR-containing protein (Glyma.13 g190000, Glyma.13 g190300, Glyma.13 g190400, and Glyma.13 g190800). Two genes (Glyma.13 g190400 and Glyma.13 g190800) showed two-fold higher expression in the R-NIL only at no aphid infestation (time 0), while the FDR of the respective genes was greater than the filtering threshold of 0.1. However, the difference between R- and S-NILs for both genes diminished after aphid infestation (Table 3). This result suggests that such LRR type of genes may not be primarily responsible for Rag5-conferring aphid resistance. With different modes of resistance of Rag2 (antibiosis) and Rag5 (antixenosis), it is expected that the molecular mechanisms of resistance of the two genes will be different, even though their genomic locations may be in a close proximity. It is also plausible that our study may not be able to capture the key genes in this region, because the genome of Williams 82 does not possess any aphid resistance allele.

Conclusions

The present study compared changes in mRNA abundance following aphid infestation of resistant and susceptible NILs developed for the Rag5 locus. Many DEGs were common between the two genotypes and were functionally relevant to known defense mechanisms reported in various host-aphid systems. The responses to aphid infestation in the two NILs significantly differed in timing of changes in gene regulation, which was triggered more quickly in the R-NIL. There was also a high level of commonality found in the DEGs between the NILs, which mostly function in mediating plant defense against aphids. Transcriptomic analyses on R- and S-NILs provided reduced lists of DEGs at each time point when compared to studies conducted with unrelated pair of genotypes.

By assessing the DEGs in the Rag5-containing QTL region on chromosome 13, three non-NBS-LRR candidate genes – Glyma13g190200, Glyma13g190500, and Glyma13g190600 –showed distinct changes in their expression between R- and S- NILs. Moreover, four NBS-LRR genes in the candidate region did not have differential expression between R- and S-NILs at any time point. Our results indicate that the Rag2 and Rag5 resistance may be conditioned by different genes. Further studies, such as metabolomics and proteomics, will be helpful to examine the roles of these candidates in aphid resistance.

Methods

Plant materials, aphid infestation, and sampling

PI 567301B was identified as a source of Rag5 by QTL mapping with Wyandot x PI 567301B RIL population [22]. For fine-mapping of the QTL and further studies, NILs segregating at the resistance locus Rag5 were developed at USDA-ARS, Wooster, OH by five backcrosses (BC) using the cultivar Wyandot as the recurrent parent and PI 567301B as the donor parent. During development and selection of BC lines, the soybean aphid resistance of each BC line and two parental lines was evaluated by counting the number of aphid per plant at 3 weeks after infestation using the method published earlier [22]. A plant was scored as resistant if it had <25 aphids and a plant was scored as susceptible if it had >500 aphids. There was a clear difference in the aphid counts to identify a resistant plant from a susceptible one. As a precaution, plants with aphid numbers between 25 and 500 were discarded. All plants were also screened with 10 SSR markers between OSU180 and BARCSOYSSR_13_1168, flanking the Rag5 locus [22]. BC lines were selected based on quality of phenotypic data and marker genotypes and two BC5 NILs (R- and S-NILs) were finally selected in the fall of 2010 from a total of 120 BC5F3 plants segregating for the Rag5 locus. Individual seed of homozygous aphid resistant (Rag5/Rag5) and susceptible (rag5/rag5) BC5 lines were planted in 4-cm diameter × 15-cm deep plastic cone-tainers (Stuewe & Sons, Tangent, OR) and seedlings were grown in a growth chamber in OARDC. The growth chamber was maintained at 24/20 °C and 16 h/8 h day/night temperatures and light/dark periods, respectively.

Biotype 2 aphid was first collected in a soybean field at OARDC, Wooster, OH, which was confirmed as a new biotype of soybean aphid and named as biotype 2 [51]. A pure laboratory colony of biotype 2 has been maintained in a growth chamber at OARDC and all aphids used in this experiment were collected from this growth chamber colony. Fifteen adult aphids were placed on the top leaves of resistant and susceptible plants at V2 stage. The youngest partially expanded trifoliate leaf along with the growing point from each seedling was sampled in a 2-ml tube at time 0 (i.e., without aphid infestation), and at 6, 12, and 48 hai. Each seedling was sampled once and discarded immediately. At each collection time, a new set seedlings from the remaining infested seedlings were used for tissue collection. The tissue from each seedling represented a replicate in the study. At tissue collection at 6 and 12 hai, more aphids were added on seedlings meant for later time points if less than 10 aphids were counted on a seedling, in order to maintain continuous aphid pressure on each seedling. All aphids and nymphs were removed from the leaf before tissue collection. For time 0 samples, leaves were collected just before infestation of aphids. Leaf samples were immediately frozen in liquid nitrogen and later at −80 °C.

RNA preparation and high-throughput sequencing

Collected leaf tissue was ground by a pestle in liquid nitrogen and total RNA was extracted using RNeasy Plant Mini Kit (Qiagen, Germantown, MD, USA) according to the manufacturer’s instructions. Quality and quantity of the extracted RNA was assessed using Agilent RNA 6000 Nano Kit and 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). RNA (1μg/sample) was used to synthesize a double-stranded (ds) cDNA library for each sample using Illumina’s TruSeq RNA Library Prep Kit v2 (Illumina Inc., San Diego, CA, USA) as per the manufacturer’s instructions. The ds-cDNA libraries were quantified using the Qubit 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA) and quality of each library was assessed using Agilent DNA 1000 Kit and 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Of the total 24 cDNA libraries (4 time points × 2 genotypes × 3 replicates), 8 libraries representing single replicates were multiplexed with different barcode sequences embedded in preparation for high-throughput sequencing. Each of the multiplexed libraries (50 mol) was sequenced in 100-bp paired-end fashion, in three lanes of Illumina HiSeq2000 platform at the Ohio State University’s Comprehensive Cancer Center, Columbus, OH, USA.

Read mapping and statistical analyses

The Galaxy platform facilitated the execution of software for analyzing high-throughput sequencing data, which is maintained by the Molecular and Cellular Imaging Center Computational Biology Laboratory (MCBL) at OARDC. FastQC [52] was used to obtain statistics on read quantity and quality before preprocessing. Cutadapt [53] was used to remove adapter sequences and poly-A and -T tails with default settings except the followings; 3′ end and minimum overlap length of 6. Then, read sequences were controlled by the function trim the reads by quality using the default options with the following changes; 40 for length threshold, 20 for quality threshold (Phred score), and discard of internal N.

Read mapping to the soybean genome assembly (Glyma.Wm82.a2.v1) was conducted by the splice junction mapper, Tophat2 [54], with the default options except; 50 for mean inner distance between mate pairs, 100 for standard deviation for distance between mate pairs, and no report discordant pair alignment. A few input parameters were changed as followings; minimum intron length 70 and maximum intron length 20,000. The reads mapped to the genome were counted using HTSeq [55] and the mode union was used to handle multi-mapping reads. The quantified transcript read files were imported into Bioconductor, an open-source software project based on the R programming language [56]. Effects of aphid infestation were initially investigated by comparing changes in gene expression between pre- and post-aphid infestation in the respective R- and S-NILs using DESeq2 [33]. DEGs were filtered as the following criteria; mean read counts >30, |log2fold change| > 2, adjusted P-value (false discovery rate, FDR) < 0.01. Transcriptomic differences caused by aphid infestation between R- and S-NILs were subsequently identified at each time point and the DEGs were filtered by thresholds - mean read counts >10; |log2fold change| > 1; FDR < 0.1.

Gene ontology (GO)

GO term enrichment analysis on the various groups of DEGs was conducted using agriGO (available at http://bioinfo.cau.edu.cn/agriGO/) for Singular Enrichment Analysis [57]. For each analysis, Glycine max v2.0 was selected as the species and Glycine max Wm82.a2. v1 was used as reference with defaults in other parameters, including FDR < 0.05.

Validation of gene expression using quantitative reverse transcription (qRT)-PCR

cDNA was transcribed using Bio-Rad® iScript synthesis kit (Bio-Rad, Hercules,CA). Primer pairs were designed to confirm expression of 8 targeted genes using Primer 3 [58], as listed in Additional file 3: Table S2. Actin (Gyma.08G182200) [34, 35] was used as the endogenous qRT-PCR control using forward primer 5′-AGTTCTGCAGTGGAAAAGAGCTA-3′ and reverse primer 5′-TCATGAATTCCAGTAGCTTCCAT-3′. Real-time quantification was conducted using Bio-Rad® iScript iQ™ SYBR Green Supermix Kit and Bio-Rad® CFX96™ system. Final volume 15ul of PCR mixture included 200 ng of cDNA, 400 nM of each primer, 2.7ul of sterile water and 7.5ul of iQ SYBR Green Supermix. PCR amplification was conducted as manufacturer’s suggestion as the following: 95 °C for 3 min, 35 cycles of 95 °C for 10s and 59 °C for 30s, and followed by melt curve analysis. There were 3 biological replications and 2 technical replications for each gene per time point. Relative expression levels and fold changes were determined using comparative C T method [59]. Statistical analysis was conducted using PROC TTEST in SAS 9.4 (SAS institute, Cary, NC).