Transcriptome sequencing and drought resistance gene annotation in Quercus liaotungensis leaves

Q. liaotungensis is an important drought-resistant tree species in Northeast China where the climate is dry and rainless. In this study, we performed a deep transcriptomic sequencing in Q. liaotungensis leaves, including de novo assembly and functional annotation for screening the candidate genes involved in drought avoidance. A total of 25,593 unigenes were obtained from Illumina sequencing platform. According to Gene Ontology annotation and KEGG pathway enrichment analysis, we screened a series of candidate genes encoding SOD, POD, CAT, DREB, MYB, WRKY, bZIP, and NAC from the Q. liaotungensis leaf transcriptome, all of which are potentially involved in drought resistance. The results of this study expanded the genetic resources of Q. liaotungensis and provided a theoretical basis for further exploring the functional gene information of Q. liaotungensis.


Introduction
Oaks belong to the genus Quercus, which comprises several hundred diploid and highly heterozygous species spreading throughout the northern hemisphere, from the tropical to the boreal regions (Abrams 1990). Among the oak species, Q. liaotungensis, which leaf is the main food source for Antheraea pernyi, is an important drought-resistant tree species in the northern warm temperate zone. The leaf morphology of Q. liaotungensis is relative small which could adapt to the dry and rainless climate in Northeast China. However, a detailed characterization of genomic information of Q. liaotungensis leaf coping with drought is lacking.
Recent transcriptomic and gene expression profiling studies in oaks have led to the construction of large cDNA libraries (Ueno et al. 2010;Kremer et al. 2012;Tarkka et al. 2013;Torre et al. 2014), and RNA-Seq studies have made it possible to identify genes involved in drought-resistant (Torre et al. 2014). In this study, the leaves of Q. liaotungensis were used for transcriptome analysis for expanding the genetic resources of Q. liaotungensis. Besides, a series of drought-related factors of Q. liaotungensis leaf were also screened. The results of this study may provide a theoretical basis for further research on drought resistance mechanism of Q. liaotungensis.

Sample preparation and RNA extraction
Q. liaotungensis used in this study were located at the research base of Shenyang Agricultural University under natural environment at 23 ± 2 °C with 70 ± 5% relative humidity. Juvenile and mature leaves (Fig. 1A) from 30 individuals of 4-years old plants were collected and immediately frozen in liquid nitrogen and stored at − 80 °C until processing. Total RNA was extracted using Trizol reagent (Invitrogen) according to the manual. The quantity and purity of RNA were analyzed using NanoDrop ND-1000

De novo transcriptome assembly
The adaptor reads of the raw data were removed using Cutadapt software (https:// cutad apt. readt hedocs. io/ en/ stable/) (Martin 2011). The clean data were obtained by removing the low quality and repeat reads. The sequence quality was verified by FastQC software (http:// www. bioin forma tics. babra ham. ac. uk/ proje cts/ FastQC/), including Q20, Q30, and GC content. The clean data were de novo assembled using Trinity software (Grabherr et al. 2011). Trinity grouped transcripts into clusters based on the shared sequence content. Each transcript cluster was referred to as a 'unigene', and the longest transcript sequence was chosen as the gene sequence of 'unigene'.

Validation of RNA-Seq by quantitative RT-PCR
Ten unigenes were selected for qRT-PCR to validate the transcriptome data. Total RNA from the Q. liaotungensis leaf samples used for qRT-PCR were the same as for RNA-Seq. The gene-specific primers were designed using the predicted CDSs as reference sequences. qRT-PCR was performed on a CFX Connect™ Real-Time System (Bio-Rad) using a 20-μL reaction system with a procedure as follows: 95 °C for 30 s, followed by 39 cycles of 95 °C for 5 s, 60 °C for 30 s. Melting curves were generated after each run to confirm a single PCR product. Each reaction was run in triplicate. mRNA quantity of each gene was calculated with the 2 −ΔCT method (Livak and Schmittgen 2001).

Results and discussion
The RNA-seq data generated 54,153,182 raw reads. After filtering out the low-quality reads, 53,021,436 clean reads were obtained, which assembled into 41,207 transcripts with a mean length of 704 bp and GC content of 42.17%, and 25,593 unigenes with a mean length of 687 bp and GC content of 42.31%, based on Trinity assembly platform (  confirming the expression of the unigenes identified in the deep sequencing analysis. The unigenes were classified into three categories including cellular component, molecular function and biological process via GO analysis, with 675, 1905 and 2849 GO terms corresponding to each category, respectively. The top 25 significantly clustered GO terms of the unigenes for each category are shown in Fig. 4. Among the subcategories of cellular component, assignments were mostly given to nucleus, integral component of membrane, plasma membrane, cytoplasm, chloroplast, mitochondrion, and cytosol. The majority of the annotated unigenes were assigned to protein binding, ATP binding, protein serine/threonine kinase activity and DNA binding in molecular function. Dominant GO terms of the biological process subcategories were grouped into regulation of transcription, defense response, protein phosphorylation, oxidation-reduction process, and signal transduction. KEGG pathway enrichment analysis was conducted for unigenes obtained from the transcriptome. The results showed that the unigenes in Q. liaotungensis could be assigned to 138 pathways. Among the 25 top KEGG pathways with the highest representation of unigenes, the abundant genes mapped onto endocytosis, plant hormone signal transduction, starch and sucrose metabolism, RNA transport, biosynthesis of amino acids, carbon metabolism, and amino sugar and nucleotide sugar metabolism (Fig. 5).
We screened a series of candidate genes potentially involved in drought adaptation via GO and KEGG analysis  Table 3).
Drought has several effects on plant growth and development, one of which is oxidative damage. We identified 41 genes encoding superoxide dismutase (SOD), peroxidase (POD) and catalase (CAT) ( Table 3) which compose the antioxidant defense system in plants that can cooperate to resist the damage of active oxygen to cells. Seven candidate  (Table 3) including DREB1 and DREB2 were obtained. Overexpression of DREB1 increased tolerance of Malus baccata to low temperature, drought, and salt stresses (Yang et al. 2010). When treated with dehydration, the expression of DREB2A increased in Arabidopsis thaliana (Sakuma et al. 2007). Several drought resistancerelated TFs such as MYB and bZIP which were founded in Quercus pubescens leaves (Torre et al. 2014) were also identified in this study. We screened 5 genes encoding MYB TFs, which is one of the largest transcription factor families and play regulatory roles in developmental processes and defense responses in plants, including MYB108, 24, 44, 12 (Table 3). The bZIP and NAC TFs are known to play a crucial role in response to various processes in plant as well as abiotic or biotic stress challenges such as drought (Xiang et al. 2008;Huang et al. 2015). WRKY proteins are newly identified TFs that are also involved in drought tolerance in plants (Ren et al. 2010). Overall, we screened 10, 42, and 6 candidate genes encoding bZIPs, WRKYs, and NACs, respectively (Table 3).
Through high-throughput sequencing, we can quickly obtain the desired target genes. As a consequence, the results of this study expanded the genetic resources of Q.   liaotungensis and provided useful information for further research on drought resistance mechanisms of Q. liaotungensis.
Author contribution statement GW and LQ conceived the project; GW performed the experiments; GW analyzed the data and wrote the paper.