Objective

Soybean aphid (Aphis glycines Matsumura; SBA) is the most economically damaging insect pest of soybean (Glycine max) in the United States (US) [1]. In the US, it is estimated that annual economic losses due to the SBA are approximately $4 billion [2]. Although host plant resistance to SBA exists, farmers rely on broad-spectrum foliar insecticide applications to reduce SBA populations [3]. The dependency on the use of chemical management has resulted in pyrethroid resistance in SBA populations in Iowa, Minnesota, North Dakota and South Dakota as well as the effects on non-target beneficial organisms [4, 5]. Host resistance to SBA is not widely adopted, which may partially be due to the presence of four SBA biotypes (i.e., biotype 1: avirulent, biotype 2: virulent to Rag1, biotype 3: virulent to Rag2, biotype 4: virulent to Rag1, Rag2 and Rag1 + Rag2) in the US [6,7,8]. Initial observations of SBA on resistant soybean were attributed to the presence of virulent biotypes [6,7,8]. However, Varenhorst et al. [6] demonstrated that inducer populations of avirulent (biotype 1) or virulent (biotype 2) biotypes improved conditions for subsequent (i.e., response) populations of biotype 1 or biotype 2 SBA on resistant (i.e., Rag1) and susceptible soybean, which is defined as induced susceptibility [9]. Furthermore, the induced susceptibility effect could be further categorized as feeding facilitation [10] (i.e., conspecific inducer improves host for conspecific response population) and obviation of resistance [11] (i.e., virulent inducer improves host susceptibility for avirulent response population). While induced susceptibility effects indicate that not all SBA observed on the resistant hosts are necessarily virulent [9], the mechanism of the induced susceptibility effects is yet to be characterized. Therefore, the major objective of this study was to use RNA sequencing (RNA-seq) to characterize induced susceptibility in soybean when a biotype 2 inducer is present.

Data description

Plant material and aphid biotypes

The data in this submission came from a greenhouse experiment using two genotypes of soybean (susceptible cultivar LD12-1583R, and resistant cultivar LD12-15813Ra with Rag1 gene), and two SBA populations (biotype 1-avirulent and biotype 2-virulent [6]). A detailed overview of the experiment is provided in Supplementary file 1 and Figure S1 (Table 1).

Table 1 Overview of data files/data sets

RNA extraction, library preparation, and sequencing

Leaf samples collected at day 1 and day 11 from resistant and susceptible cultivars (non-infested, infested with inducer biotype 2: response biotype 1) were used to isolate RNA using PureLink RNA mini kit (Invitrogen, USA). Isolated RNA was treated with TURBO™ DNase (Invitrogen, USA) to remove any DNA contamination, following the manufacturer’s instructions. The RNA samples from three replicates were pooled in equimolar concentration, and RNA-seq libraries were sequenced on an Illumina NextSeq 500 at 75 cycles. Ten RNA libraries were prepared and sequenced with the sequencing depth ranging from 24,779,816 to 29,72,4913 reads (Data files 1–10; Table 1; Table S1).

Quality control assessment

Quality control of reads was assessed using FastQC program (version 0.11.3) [12]. The FastQC results were visualized using MultiQC v1.3 [13]. Low quality bases (QC value < 20) and adapters were removed by trimming using the program Trimmomatic (version 0.36) [14]. The coding sequences (Gmax: Gmax_275_Wm82.a2.v1.transcript_primaryTranscriptOnly.fa.gz) were obtained from the Phytozome database and aligned using Salmon ver.0.9.1 [15] accessed from Bioconda [16] (Data files 11–20). A flow chart showing the RNA-seq data analysis pipeline is shown in Figure S2. The downstream analyses were conducted using iDEP 0.82 [17]. Read quants were filtered with 0.5 counts per million (CPM) in at least one sample. Quantified raw reads were transformed using regularized log (rlog), which is implemented in the DESeq 2 package [18] (Data file 21). The transformed data were subjected to exploratory data analysis such as hierarchical clustering (Figure S3; Data file 22) and the correlation between samples (Figure S4).

Statistics of transcriptome data

The FastQC analysis showed Phred quality scores per base for all samples higher than 30, and GC content ranged from 45 to 46% with a normal distribution (Figure S5, Table S1). After trimming, over 99% of the reads were retained as the clean and good quality reads. Upon mapping these reads, we obtained high mapping rate ranging from 90.4 to 92.9%. Among the mapped reads, 85.8% to 91.9% reads were uniquely mapped. After filtering with 0.5 counts per million (CPM) in at least one sample and rlog transformation, a total of 37,468 genes (66.9% of original 55,983) were retained for transformation (Data file 21). The hierarchical clustering based on 3000 most variable genes, sample distances (Figure S3; Data file 22) indicated that sample clustering followed the time points of sample collection (i.e., Day 1 and Day 11). The correlation between the samples using the top 75% of genes showed in a range of 0.96–1 (Figure S4).

Limitations

The quality filtering of downloadable raw fastq files is recommended before use. Kal’s z-test [22] integrated with CLC Genomics Workbench (https://www.qiagenbioinformatics.com/) and analysis guided by the reference genes could be used to study the differential gene expression for pooled samples with no replications.