Background

Obligate parasites, such as plant–parasitic nematodes (PPNs), are infamously known for their ability to suppress host defense mechanisms and cripple yield of many agricultural crops. Such devastation is tightly orchestrated by nematode effector proteins that commandeer host–plant metabolic machinery. One of the most destructive PPNs to soybean yield is the soybean cyst nematode (SCN; Heterodera glycines). Worldwide, approximately 1.5 billion dollars in soybean yield is lost annually due to SCN infestations [1],[2]. In SCN susceptible soybeans, this devastation begins when the female juvenile–stage 2 (J2) nematode penetrates the host root. J2 effector proteins are injected into the root, dissolving plant cell walls and driving formation of a metabolically–active, multinucleated feeding site known as a syncytium [3]. Newly–molted J3 males and females feed from this nutrient–rich syncytium, subsequently molt into J4 larvae and copulate [4]. After approximately 30 days post–copulation, a hardened sac of SCN eggs known as a cyst becomes visible to the naked–eye. In the resistant reaction however, cysts are not visible since J2 nematodes can neither form a nutrient–rich syncytium nor copulate. Thus, J2 nematodes starve to death.

With next–generation sequencing (NGS) now becoming a central assay in transcriptomics, entire transcriptomes can now be sequenced at unprecedented resolution. Fueled by the economic impact of SCN infestations, numerous studies have utilized NGS assays to sequence and quantify the soybean transcriptome [5]-[8].

In this study, we extend such works by conducting transcriptomic and regulatory analyses on soybean roots (Peking cv.) inoculated with SCN. We sequence the soybean root transcriptome and contrast resistant and susceptible SCN reactions at 6 and 8 days after inoculation (dai). Our findings reveal likely defense–response gene candidates and a potential regulatory “signature” that captures TFBS over–representation throughout both resistant and susceptible reactions.

Results and discussion

Illumina sequencing and read alignment

cDNA libraries from soybean roots were generated after independently inoculating roots for both 6 and 8 dai in two SCN populations, NH1-RHg (confers resistant reaction in Peking; Race 3) and TN8 (confers susceptible reaction in Peking; Race 14). A baseline control cDNA library was also created from roots uninoculated with SCN. RNA was prepared using the Illumina TruSeq sample preparation kit. Single–end RNA–sequencing (RNA–Seq) was performed on the Illumina GAIIx, producing a total of 30 million reads 80 bp in length. Across all sequenced libraries, quality assessment subtracted between 10%–19% of reads for being either a contaminent sequence or of low quality (Table 1). Using the BWA aligner [9], quality reads were mapped against the soybean transcriptome build version 1.1 [10]. Reads aligning to multiple transcripts were identified and assigned to the transcript with the highest quality score. In total, 59% to 69% of quality–assessed reads mapped to the soybean transcriptome.

Table 1 Soybean–SCN pathogenesis RNA–Seq summary

Soybean transcript abundance and profiling during SCN pathogenesis

Differential expression tests were performed using the R package DESeq [11]. Soybean transcripts were functionally annotated using both Gene Ontology (GO) [12] and PFAM [13]. Both fold change and log2 fold change of expression profiles (as RPKM) were computed between experimental and uninoculated samples. To render a soybean transcript differentially expressed (DE), the transcript had to have a log2 fold change greater than or equal to ±1.0 and have atleast 5 mapped reads across all replicates. A total of 12,377 soybean transcripts were identified to be DE in at least one of the samples (Additional file 1). To disseminate the plant–pathogen defense–response landscape, a subset of 181 DE transcripts were mined and classified given their GO and PFAM functional annotations (Table 2, Additional file 2). Interestingly, virtually all of these annotation classifications exhibited induced expression profiles exclusive to the resistant reaction. For instance, all 12 transcripts of β–1,4–glucanase (β–1,4–G) were generally induced throughout the resistant but suppressed in the susceptible reaction. Numerous studies reveal how a pathogenic nematode can commandeer not only β–1,4–glucanase but other cellulases to drive formation of a nematode feeding site [14]-[16]. Both Tucker et al. [16] and Ibrahim et al. [14] quantified this destructive commandeering capability by quantifying the soybean transcriptome using high–throughout microarrays. This latter study, though examining soybean–root knot nematode interplay, reveals cell–wall modeling, defense response, and metabolism, to be the most impacted host pathways following pathogenic nematode infection. Critical genes encoding isoflavonoid and flavonoid biosynthesis such as chalcone synthase (ChS), chalcone reductase (ChR), and chalcone isomerase (ChI) also exhibited similar induced expression profiles. Glutathione S-transferase (GST) genes were also induced in the resistant reaction. GST is a class of enzymes involved in reactions leading to xenobiotic degradation [17], and has been shown to be induced during an SCN resistant reaction [18]-[20].

Table 2 Various genes perceived during defense response are expressed during SCN inoculation

Transcripts of genes encoding two lipoxygenase (LOX) gene family members, arachidonate 8-lipoxygenase (A–8 LOX; EC: 1.13.11.40) and linoleate 13S-lipoxygenase (L–13S LOX (LOX2); EC: 1.13.11.12) were also induced throughout both 6 dai and 8 dai resistant reactions. The role A–8 LOX plays during a nematode reaction has yet to be elucidated, however lipoxygenases in–general are consistently induced throughout a resistant SCN reaction [21]-[24]. This raises speculation that A–8 LOX may be perceived during SCN pathogenesis.

Ribonucleoside-diphosphate reductase (RnDR; EC: 1.17.4.1) and protein disulfide-isomerase (PDI; EC: 5.3.4.1) were induced in the resistant reaction. Both RnDR and PDI are thioredoxins, a family of reductases known to play defense–response roles upon perception of a pathogen [25]-[27]. Little is known about the role RnDR plays in SCN pathogenesis, however an earlier microarray study examined abaxial and adaxial soybean embryo expression profiles upon exposure to auxin 2,4-dichlorophenoxyacetic acid (2,4–D). Microarray results revealed differentially expressed levels of RnDR 21 days after auxin inoculation [28]. PDI on the other hand, is a well–studied thioreductase expressed during plant defense [29],[30], especially in soybean roots undergoing a resistant SCN reaction [31].

Pathogenesis–Related (PR) transcripts, namely PR5 and PR10, were induced in the resistant reaction. PR genes were expressed not just during SCN nematode pathogenesis [32]-[38] but also throughout abiotic stress [39], phytohormone signaling [40] and drought [41].

Glyoxalase I (GLY I; lactoylglutathione lyase, EC: 4.4.1.5) was also induced throughout the resistant reaction. GLY I has been shown to exhibit an induced expression profile in pumpkin seeds exposed to numerous abiotic stresses [42]. Lastly, little is known about the role phytochelatin synthetase (PCS) plays throughout SCN pathogenesis, however PCS has been shown in a prior study to be induced during aphid herbivory [43].

Following quantification of the SCN–inoculated soybean root transcriptome, our analyses support earlier works by Klink et al. ([44],[45]), Kandoth et al. ([20]), and Li et al. ([33]). We build–on such studies by identifying a small subset of potentially novel defense–response candidate genes as well as a biologically–sound proximal regulatory landscape that captures host–SCN pathogenesis interplay.

Gene Ontology enrichment in resistant and susceptible reactions

To identify statistically significant Gene Ontology (GO) annotations, the top 750 induced and 750 suppressed genes across for all SCN samples each independently underwent GO Process enrichment using the AgriGO server [46]. Numerous GO Processes were statistically significant across resistant and susceptible reactions (Table 3). GO Process p–values were adjusted using Bonferroni False Discovery Rate (FDR) and all GO Processes with adjusted p–values less than 0.05 were selected.

Table 3 Abundance of enriched Gene Ontology annotations

The top 30 most statistically significant GO Processes within induced genes were identified (Table 4). Processes such as “defense response”, “syncytium formation”, “response to other organism”, “response to oxidative stress”, and “response to stress”, were revealed to be statistically significant mainly in the resistant reaction when compared to the susceptible. Processes associated with organelle modification and intracellular organization also exhibited similar reaction–specific significance. This race–exclusivity exposes the crucial role basal operations play during pathogen perception.

Table 4 GO Process enrichment of induced soybean genes

Similarly, the top 30 most statistically significant GO Processes within suppressed genes were also identified (Table 5). Contrasting GO Processes in suppressed genes to that of induced genes reveals an entirely different catalog of annotations. For instance, 20 of the 30 GO Processes in suppressed genes are statistically significant across both resistant and susceptible reactions. This indicates that nematode effectors are generally operable in a race–independent manner and capable of effortlessly suppressing a majority of crucial basal processes.

Table 5 GO Process enrichment of suppressed soybean genes

The most suppressed GO Processes were “photosynthesis”, “photosynthesis, light harvesting”, “photosynthesis, light reaction”, and “generation of precursor metabolites and energy”. Interestingly, it has been shown in prior studies that PPNs can suppress photosynthesis in tomato plants by disrupting cytokinin and gibberellin signaling [47],[48]. Aside from photosynthetic processes, those associated with metabolism and biosynthesis were highly suppressed across both reactions. This suggests that both resistant and susceptible SCN populations share a common goal of crippling basal metabolic machinery and suppressing the host machinery responsible for photosynthesis.

Derivation of over–represented TFBSs

The 1,000 most induced and 1,000 most suppressed genes were identified for each sample and the promoter sequence 2 kb upstream from each genes transcription start site was retrieved and appended to a FASTA file (Additional file 3). To quantify abundance of cis–regulatory TFBSs within promoter sequences, we used a collection of 68 plant Position Weight Matrices (PWMs) from AthaMap [49] and JASPAR [50]. PWMs are multi–dimensional matrices frequently used to model regulatory elements, namely TFBSs. Each cell in a PWM represents a weight as to the likelihood a particular base at a specific index is a regulatory element. Thus, mapping PWMs onto promoter sequences and statistically quantifying its abundance reveals insight into the magnitude of TFBS over–representation. To efficiently execute such mapping, we had developed a multivariate statistical software named Marina [51]. Marina maps TFBS models such as PWMs onto promoter sequences and infers magnitude of TFBS over–representation using 7 knowledge–discovery metrics. The Iterative Proportional Fitting (IPF) algorithm [52] normalizes output produced from each of the 7 metrics, enabling unanimous agreement across the metrics as to the magnitude of TFBS over–representation. IPF scores range from 1 to N whereby N is the total number of over–represented TFBSs. Scores in the range of 1 represent over–represented TFBSs while scores in the range of N represent highly under–represented TFBSs.

For all SCN samples, Marina mapped all 68 plant PWMs onto promoter sequences of both induced and suppressed genes. In total, 46 TFBSs were over–represented in atleast one of the four samples (Figure 1). To reveal which TFBSs exhibited variations in their IPF scores, we computed the percent change of IPF scores across both Race 3 and Race 14 timepoints. The difference in Race 3 and Race 14 percent change was derived and partitioned into 2 bins: TFBSs with a Race 3 and Race 14 IPF score percent difference of at least 50% (Figure 1a), and TFBSs with a Race 3 and Race 14 IPF score percent difference under 50% (Figure 1b). Thus, such computation allows for identification of which TFBSs vary greatly not with respect to 6 dai or 8 dai, but with respect to Race 3 and Race 14 inoculations.

Figure 1
figure 1

A heatmap of Marina IPF scores. Across the four SCN samples, over–represented TFBSs were identified given promoter sequences from the 1,000 most induced and 1,000 most suppressed genes. In total, 46 TFBSs were over–represented in one of the inoculations and 29 TFBSs were over–represented across all inoculations. IPF scores range from 1 to N whereby 1 represents over–represented TFBSs and N represents under–represented TFBSs. (a) Enriched TFBSs within Race 3 and Race 14 reactions with IPF scores having percent difference of at least 50%. (b) Enriched TFBSs within Race 3 and Race 14 reactions with IPF scores having percent difference less than 50%.

There were 29 TFBSs over–represented across all four samples (Additional file 4). If a TFBS was not over–represented in a specific sample, that TFBS was assigned an score of N+1 so as to serve as a proxy for being highly under–represented.

Many TFBSs are over/under–represented in both resistant and susceptible reactions

Contrasting TFBS IPF scores across samples reveals that 30 of the 46 TFBSs either increase or decrease in IPF score regardless of the reaction (Figure 1). For instance, the TFBS for STF1 exhibits a relatively modest increase in its IPF score across both reactions. Interestingly, STF1 IPF score increases from 11th to 1st from 6 dai to 8 dai respectively in the resistant reaction. Besides the role STF1 plays in plant development [53], little is known of the role this transcription factor plays in plant defense.

IPF score for the HAHB4 TFBS greatly increased in the resistant reaction and susceptible reaction. A prior study found HAHB4 to contribute to jasmonic acid and ethylene signaling crosstalk [54]. Similarly, TFBSs for DOF2 and DOF3 exhibited relatively weak increases in IPF scores across resistant and susceptible samples. DOF transcripts have not been explicitly quantified as–far as their gene expression during SCN pathogenesis, however such proteins have been detected during auxin signaling [55]. In contrast to DOF2 and DOF3, the TFBS for TEIL had a near–50% jump in IPF scores across both reactions. Being the tobacco homolog of ethylene insensitive (EIN3), TEIL gene products have been shown to bind directly to the promoter sequence of PR1a, a central contributor in plant defense dynamics [56]. Interestingly, across both resistant and susceptible reactions, TEIL scores appear to be relatively equal to one another.

The A. thaliana MYB77 homolog, AtMYB77, exhibits a mild change in IPF score across both resistant and susceptible reactions. Across both reactions, AtMYB77 IPF scores were generally under–represented at 6 dai but become slightly over–represented at 8 dai. An earlier study revealed interaction between MYB77 and auxin response factor 7 (ARF7) [57], further accentuating the role AtMYB77 could play in host–pathogen interplay [58]. The OsCBT TFBS exhibited pronounced IPF scores across all four treatments. In both the resistant and susceptible reaction, OsCBT was highly over–represented only at 6 dai. It was shown that OsCBT mutants conferred increased pathogen resistance upon inoculation with Magnaporthe grisea, revealing that OsCBT suppresses defense response [59].

Several TFBSs are over–represented in a race–dependent manner

The remaining 16 TFBSs were over–represented in one reaction compared to the other. Such TFBSs can expose novel insight into TFBSs over–representation patterns respective to a specific reaction.

ZAP1, a WRKY1 TFBS [60], appears to be highly over–represented during the resistant reaction but slightly under–represented in the susceptible reaction. Being a WRKY TFBS, it comes as no surprise that enrichment of this TFBS in the resistant reaction captures the need to host a significant, systematic plant defense response. Similarly, PIF3–1 and PIF3–2 were both under–represented during the susceptible reaction however slightly over–represented in the resistant reaction. It has been shown that PIF plays roles in phytochrome signaling [61]. Due to its photomorphogenic regulatory capabilities, Since photosynthetic processes are heavily suppressed within resistant and susceptible reactions (Table 5), such suppression explains why PIF3–1 and PIF3–2 have such severely under–represented IPF scores. Indeed SCN pathogenesis does not only disrupt the photosynthetic machinery but also the plants ability to execute sound phytochrome signaling.

Conclusions

We used RNA–Seq to sequence soybean whole–root (Peking cv.) at both 6 and 8 dai upon inoculation with a resistant (NH1–RHg; Race 3) and susceptible (TN8; Race 14) population. Contrasting TFBSs over–represented in promoter sequences of DE soybean genes across 6 and 8 dai time points exposed underlying transcriptomic and cis–regulatory dynamics within the soybean root during pathogenesis. In–total, over 30 million reads from soybean whole–root was sequenced and differential expression analysis revealed 181 transcripts to be statistically and biologically significant during defense–response. Several viable defense–response gene candidates joined these ranks, including glyoxalase I, arachidonate–8 lipoxygenase, phytochelatin synthetase, and ribonucleoside-diphosphate reductase.

46 TFBSs were rendered over/under–represented across all resistant and susceptible samples. Interestingly, 30 of these TFBSs were either over or under–represented across both reactions. Thus, our results reveal presence of a biologically–sound regulatory “signature” that identifies reaction–specific soybean regulatory patterns during both resistant and susceptible SCN reactions.

Methods

Plant procurement and SCN inoculation

Glycine max cv. Peking seeds were surface–sterilized by treating the seeds with 10% bleach (0.6% sodium hypochlorite) for ten minutes, followed by several washes with distilled water. Seeds were planted in sterile sand in 20 × 20 cm flats. Eight days later, seedlings were gently lifted out of the sand and rinsed clean. Five seedlings for each time point were placed on moistened germination paper in 8 × 12 × 3.5 cm plastic trays. The SCN populations NH1–RHg and TN8, were independently harvested from stock plants [62]. Females were crushed with a rubber stopper and eggs were washed through a 250 micron screen and collected on a 25 micron screen. Eggs were rinsed into a small covered tray and left to hatch for three days. J2 stage nematodes were further purified by passing them through a 30 micron cloth into deionized, distilled water and gently centrifuged at 250 relative centrifugal force (RCF) for one minute to concentrate to 2,000 J2/ml. Roots from four plants were inoculated with one ml of inoculum. Roots were covered with a second piece of moistened germination paper and the trays were placed in a larger tray with 0.5 cm water below to add humidity and wrapped in a semi-clear plastic bag for the duration of the time points. Three uninoculated control plants were also placed trays and collected separately. Per plant, four plant roots, following 6 and 8 days after inoculation (dai), were harvested and immediately frozen in liquid nitrogen and ground to a fine powder in a mortar and pestle and stored in microfuge tubes at –80°C until RNA extraction. The fifth root was stained for visualization of nematode infection with acid fuchsin [63]. RNA was extracted at 6 dai and 8 dai by phenol/chloroform and lithium chloride precipitation [64]. RNA was treated with DNase to remove any genomic DNA remaining in the samples. RNA integrity was checked by visualizing the intact 18S and 28S ribosomal bands on an agarose gel and concentrations were measured on a Nanodrop spectrophotometer (Thermo Scientific; Waltham, MA).

RNA extraction and cDNA isolation

cDNA libraries were prepared using the TruSeq RNA Prep Kit according to the manufacturer instruction (Illumina). Briefly, mRNA was purified from four micrograms of total RNA diluted in fifty microliters of nuclease–free ultra pure water using magnetic beads. Resulting mRNA was fragmented at 94°C for eight minutes. Seventeen microliters of fragmented mRNA was used as template for cDNA synthesis performed by a Superscript II Reverse Transcriptase. Second–strand synthesis was immediately performed and fifty microliters of double stranded DNA was transferred to a new tube and submitted to end repair followed by adenylation of 3’ ends. Once adenylation of 3’ reached completion, adapters containing different indexes were ligated to each library. DNA fragments having adapter molecules on both ends were amplified and enriched. Quantification and quality control were performed by loading one microliter of cDNA libraries on an Agilent DNA–1000 chip and running it on an Agilent Technologies 2100 Bioanalyzer.

Deep–sequencing and transcriptome quantification

For both NH1–RHg (Race 3) and TN8 (Race 14) reactions, cDNA libraries were sequenced from 8 day old soybean whole–root independently inoculated with SCN at 6 dai and 8 dai. Two biological replicates were sequenced for each inoculation and timepoint. Single–end RNA–sequencing was performed on the Illumina GAIIx at the United States Department of Agriculture (USDA), Beltsville, MD. An uninoculated whole–root single–replicate control was also sequenced using the same sequencing protocol. To remove low quality reads across all sequencing runs, custom bash scripts filtered all reads should its 3’ tail have a quality score of less than 22. To remove contaminent reads, sequences were subtracted if they mapped atleast once to both the Ensembl human genome (Hg19) or the JCVI Microbial Resource [65]. Remaining sequences were mapped to the soybean transcriptome (build 1.1) using BWA [9]. Across all SCN inoculated samples, transcript counts underwent normalization and variance estimation using the DESeq R package. To infer magnitude of differential expression, RPKM was computed for all inoculated and uninoculated samples and lo g 2 RPK M inoculated RPK M uninoculated was subsequently derived. All transcripts with a log 2 RPKM less than 1 and fewer than 5 mapped reads were rendered not differentially expressed.

Functional annotation & Gene Ontology (GO) enrichment

Functional annotation comprised of homology–based analysis of all sequences in the Phytozome soybean transcriptome. Of these 73,320 soybean transcriptome sequences, 7,810 sequences were subtracted for being either a scaffold or duplicate sequence. BLASTX [66] aligned the remaining 65,510 query sequences onto all UniProt plant proteins [67]. The top–scoring UniProt function annotation was assigned to the query if it did not contain ambiguous keywords, namely “Hypothetical”, “Uncharacterized” or “Unknown”.

For all samples, soybean Phytozome accessions for the top 750 induced and top 750 suppressed transcripts were identified. Gene Ontology (GO) enrichment on each accession–set was performed using the AgriGO web–server [46]. AgriGO settings were modified to quantify GO annotations using the hypergeometric distribution and Bonferroni p–value false–discovery rate (FDR) correction. To measure GO Process statistical significance in both resistant and susceptible reactions, the –log 10FDR per GO Process was summed across both 6 and 8 dai time points. Subsequently, the top 30 most statistically significant GO Processes from the top 750 induced and suppressed transcript sets were identified.

Availability of supporting data

All RNA–Seq FASTQ raw data is available from NCBI SRA. Please refer to Table 1 for such accessions.

Additional files