Transcriptional profiling of long noncoding RNAs associated with leaf-color mutation in Ginkgo biloba L
Long noncoding RNAs (lncRNAs) play an important role in diverse biological processes and have been widely studied in recent years. However, the roles of lncRNAs in leaf pigment formation in ginkgo (Ginkgo biloba L.) remain poorly understood.
In this study, lncRNA libraries for mutant yellow-leaf and normal green-leaf ginkgo trees were constructed via high-throughput sequencing. A total of 2044 lncRNAs were obtained with an average length of 702 nt and typically harbored 2 exons. We identified 238 differentially expressed lncRNAs (DELs), 32 DELs and 49 differentially expressed mRNAs (DEGs) that constituted coexpression networks. We also found that 48 cis-acting DELs regulated 72 target genes, and 31 trans-acting DELs regulated 31 different target genes, which provides a new perspective for the regulation of the leaf-color mutation. Due to the crucial regulatory roles of lncRNAs in a wide range of biological processes, we conducted in-depth studies on the DELs and their targets and found that the chloroplast thylakoid membrane subcategory and the photosynthesis pathways (ko00195) were most enriched, suggesting their potential roles in leaf coloration mechanisms. In addition, our correlation analysis indicates that eight DELs and 68 transcription factors (TFs) might be involved in interaction networks.
This study has enriched the knowledge concerning lncRNAs and provides new insights into the function of lncRNAs in leaf-color mutations, which will benefit future selective breeding of ginkgo.
KeywordsLeaf-color mutation Differentially expressed lncRNAs Target genes Functional analysis
Coding Potential Calculator
Differentially expressed mRNAs
Differentially expressed lncRNAs
Fragments per kilobase per million reads
Ginkgo biloba L
Normal green-colored leaves
Kyoto Encyclopedia of Genes and Genomes
- lincRNA, U
long noncoding RNA
Predictor of long noncoding RNAs and messenger RNAs based on an improved k-mer scheme
quantitative real-time PCR
Short Reads Archive
Only 1 to 2% of the total RNAs produced by eukaryotic cells during transcription are encoded to produce proteins, and the remaining RNAs are called noncoding RNAs (ncRNAs). ncRNAs play important roles in cells, such as rRNAs and tRNAs in protein synthesis, snRNAs in the splicing of nascent RNA, and microRNAs, siRNAs and piRNAs in inhibiting gene expression . Among ncRNAs, there is a widely distributed class of ncRNA transcripts with lengths greater than 200 nucleotides and no protein-encoding function, named long noncoding RNAs (lncRNAs) [2, 3, 4, 5]. Most lncRNAs are transcribed by RNA polymerase II and have a structure similar to that of mRNA, such as 5′ caps and 3′ poly (A) tails [6, 7]. According to the genomic location of lncRNAs relative to neighboring genes, lncRNAs can be divided into five classes: sense lncRNA, antisense lncRNA, intergenic lncRNA (lincRNA), intronic lncRNA, and bidirectional lncRNA . Moreover, lncRNAs can be classified into signals, decoys, guides and scaffolds based on molecular function [3, 9].
LncRNAs can be dynamically expressed during differentiation, and different mature lncRNAs can be formed by polyadenylation and different alternative splicing events, allowing the same gene to form different lncRNA transcripts . LncRNA is universally transcribed in eukaryotic cells and distributed in the cytoplasm, organelles and nucleus but mainly in the nucleus. LncRNA was first identified in a sequencing analysis of mice in 2002 . Currently, the functional mechanisms of lncRNA in humans and animals have been studied in great depth, especially in terms of diseases. In recent years, with the continuous improvement of bioinformatics technology, including high-throughput sequencing technology and other biological technologies, research on plant lncRNAs has developed rapidly and received increasing attention. Currently, lncRNAs have been widely identified in plants such as Arabidopsis thaliana , Zea mays , Salvia miltiorrhiza , and Populus . LncRNAs can affect a series of biological processes, such as epigenetic regulation, cell cycle regulation, cell differentiation regulation and secondary metabolite synthesis, by regulating the level of target genes [16, 17, 18].
Ginkgo (Ginkgo biloba L.) is a well-known relict plant that originates from China and has been described as a “living fossil” . As a multifunctional tree species, ginkgo has important economic and medicinal values  and has attracted researchers’ attention with many studies have been reported on the origin and evolution, cytology, molecular biology, tree breeding and medicinal value of ginkgo [19, 21, 22, 23, 24, 25]. Ginkgo is also a popular ornamental species and widely cultivated worldwide . However, there are few studies on its ornamental characteristics . Leaf color is an important trait of ginkgo as a landscape plant. The most attractive ornamental feature of ginkgo is its golden leaves in autumn . The yellow color mutant identified in ginkgo showed the phenotypic trait of yellow leaves for the entire leaf development period and had a longer foliage period than common ginkgo . Thus, the mutant not only possesses an excellent ornamental value but also provides an ideal material to study the genetic control of the leaf pigment synthesis.
Previous studies have provided an understanding of the protein-coding genes involved in leaf-color mutation [26, 27], but the role of lncRNAs in the yellow-leaf mutation has rarely been reported. In this study, normal green leaves and mutant yellow leaves of ginkgo were used as research materials to investigate their regulatory mechanism of lncRNAs in leaf-color mutation. Our objectives are to (1) establish lncRNA libraries, identify and characterize the putative lncRNAs expressed; (2) construct a coexpression network for differentially expressed lncRNAs (DELs) and differentially expressed mRNAs (DEGs); (3) predict the target genes of cis- and trans-acting lncRNAs and their functions; and (4) perform correlation analysis between lncRNAs and transcription factors (TFs) in ginkgo leaves. These findings will provide a scientific foundation for further research on the potential function of leaf-color mutations and benefit the future selective breeding and cultivation of ginkgo.
RNA sequencing and identification of lncRNAs in ginkgo leaves
Quality of the sequencing data
LncRNAs expression level analysis
Identification and validation of DELs between GL and YL
To further verify the transcriptional patterns of DELs from the RNA-seq analysis, ten DELs were randomly selected and examined using quantitative real-time PCR (qRT-PCR) at the expression level. Although the expression multiples of several DELs verified by qRT-PCR were not completely the same as those of the FPKM values, the expression levels of these 10 DELs screened from the qRT-PCR analysis were consistent with those deduced from the FPKM values (Additional file 2: Figure S1, Additional file 5: data S3). Hence, these results indicated that the transcriptomic analysis results were reproducible and reliable, and would be useful for further studies of the lncRNAs functions (especially DELs) in GL and YL of ginkgo.
DEL and DEG coexpression analysis
Functional analysis of DEL target genes
Correlation analysis of lncRNAs and TFs
LncRNAs play an important role in diverse biological processes and have been widely studied in recent years [12, 15, 28, 29]. Nevertheless, lncRNAs remain poorly understood in the context of leaf-color mutation and pigment formation in ginkgo. Leaf-color mutation is different pigment formation result characterized by a series of transitions that are coordinated by a network of interacting genes and pathways. Herein, we have constructed coexpression networks for DELs and DEGs, predicted the target genes of lncRNAs, and performed correlation functional research in leaf-color mutation leaves. Our study not only enriched the knowledge of lncRNAs but also provided new insights into the potential functions of lncRNAs in plants. These RNA-seq data might provide molecular targets that assist in the selective breeding and production of yellow-leaf ginkgo.
Bioinformatics technology for transcriptome analyses has been rapidly improved, and many lncRNAs have been identified in plants. For example, 9686 lncRNAs were found in Norway spruce . Several studies have shown that lncRNAs are similar to mRNA-encoding proteins, but the number of lncRNA transcripts is lower than that of mRNA transcripts, and lncRNAs usually exist in the nucleus [31, 32]. Our study has confirmed Cui et al.  finding that the lncRNA expression levels in ginkgo leaves are lower and shorter in length than those of the coding genes. Similar findings were reported in previous studies [31, 32], and this conclusion was universal in plants . These common factors may indicate the essential regulation of lncRNAs during growth, development and evolution . In addition, ginkgo lncRNAs contained fewer exons (mostly two exons) than the coding genes, which may be responsible for the differences in their evolution and function. This result is similar to the angiosperm poplar and gymnosperm Norway spruce [30, 34]. LncRNAs are typically greater than 200 nucleotides in length, whereas only 3% of lncRNA were > 1 kb in length in this ginkgo study. A similar condition was observed in the study of lncRNAs in maize . This indicates that a small number of long lncRNAs exist in plants.
RNA plays not only an auxiliary role as an intermediate carrier of genetic information but also a role in a variety of regulatory functions. LncRNA is essentially RNA, a long chain composed of nucleotides, which can affect the biological activities of eukaryotes through various mechanisms of action [5, 7]. Since the functions of lncRNAs are highly complex and diverse, unlike the mRNA sequences that can provide potential functional information, the sequence motifs of lncRNAs generally do not provide information to predict lncRNA functions [9, 32]. However, lncRNAs may regulate gene expression either in a cis- or trans-acting manner . The regulatory roles of lncRNAs in gene expression were achieved through acting on the adjacent target genes, and this was known as the cis-acting process of lncRNAs [37, 38, 39]. Transposable elements can regulate adjacent gene expression as a cis-element . To further analyze the lncRNA function, we have obtained 48 DELs with 72 target genes within a 100 kb range; therefore, these 72 target genes may be regulated by DELs. Because the trans-acting lncRNAs regulate gene expression at independent loci , 31 DELs regulated 31 different target genes in this study, which are involved in pigment formation process and may be an important reason for leaf-color mutation. In addition, several studies have shown that lncRNAs coordinate miRNAs, forming multiple feedforward pathways to regulate a range of target genes [42, 43]. For example, some lncRNAs can act as precursors for miRNAs . Some studies have proposed that lncRNAs may also function as miRNA primary transcripts, targets, or target mimics, providing a new mechanism for the regulation of miRNA activity [45, 46, 47]. Thus, the identification and analysis of the correlation between lncRNAs and miRNA precursors will help elucidate regulatory processes . These results will also help explore the functions of the corresponding lncRNAs . We detected six novel lncRNAs as precursors to known miRNAs that were identified in this study, which lays a foundation for the subsequent study of leaf-color mutations.
LncRNA can participate in the regulation of gene expression through various mechanisms [49, 50, 51]. Several lncRNAs can regulate mRNAs by binding or interacting with their targets [28, 52]. This could be the result of the direct regulation of lncRNAs with the promoter region or other cis-regulated elements of its coexpressed protein-coding genes . To investigate whether ginkgo lncRNAs have the potential to interact with sequences of their targets, we performed GO and KEGG analysis on cis- and trans-targets of lncRNAs. The results showed that the chloroplast thylakoid membrane subcategory was most enriched in the CC category and that photosynthesis pathways (ko00195) were most enriched in the KEGG pathway analysis. The chloroplast thylakoid membrane subcategory was highly enriched, which may result in a green-deficient leaf color or lead to an abnormal leaf color . Chlorophyll is an important pigment related to photosynthesis, and leaf-color variations are closely related to pigment synthesis [26, 48]. This suggests that the complex mechanism of leaf-color variation requires a coordinated regulatory network of posttranscriptional gene expression . Although pigment synthesis may be a multifactorial phenotypic trait, only a few pathways regulating pigment synthesis have been validated [28, 56], and our understanding of the role of lncRNA in pigment synthesis is very limited . Hence, we established a substantial number of coexpression modules to reveal the regulatory relationships and functions of leaf-color mutation or pigment synthesis. These coexpression analyses implied the functional correlation of lncRNA and protein coding, especially for TFs (MYB-related and bHLH). LncRNAs interact with a myriad of genes encoding TFs . A previous study indicated that the trans-acting lncRNA HID1 associates with the chromatin of the TF gene PIF3 and can repress its transcription in Arabidopsis . Several studies have shown that lncRNAs can regulate the activity of TFs [59, 60, 61]. Studies have indicated that the regulatory factors involved in pigment synthesis include MYB, bHLH and WD40, among which MYB plays the most important role in regulating anthocyanin synthesis, which has been proven in apple and grape species [62, 63]. These results provides a useful source for further research on pigment formation in plants.
In this study, we obtained 2044 lncRNAs including 238 DELs involved in the ginkgo leaf-color mutation through high-throughput sequencing. The results showed that 48 cis-acting DELs might regulate 72 target genes, and 31 trans-acting DELs might regulate 31 different target genes. The chloroplast thylakoid membrane subcategory and photosynthesis pathways (ko00195) were most enriched in GO and KEGG analyses. In addition, 32 DELs and 49 DEGs constituted coexpression networks, and eight DELs and 68 TFs had interaction networks. This study will provide a basis for subsequent studies on the molecular biology of the leaf color of ginkgo and will also provide a reference for the study of other plants in related fields.
Plant materials and RNA sequencing
Since ginkgo is listed as “Endangered” on the red list, we first obtained permission to collect ginkgo leaves and branches. Plant materials were collected from about 150-year-old ginkgo tree in Jiujiang city, Jiangxi Province, China (29°49′ N, 116°40’E). The ginkgo leaves exhibited GL and YL phenotypes on a main branch of the tree. And YL phenotype was identified as a xantha mutant (Ginkgo biloba “Wannianjin”) by Professor Fuliang Cao. Phenotypes of GL and YL mutant also exhibited in the Additional file 2: Figure S1 of our previous research . In addition, several GL and YL scions were grafted onto rootstocks in the ginkgo germplasm nursery at Nanjing Forestry University Base. These samples were fully expanded mature leaves (free from pests and diseases). Three leaves were sampled per replication with three replicates for each group in the same period. The total RNA from these leaves (GL and YL) was extracted and purified as previously described  with a slight modification in the Illumina sequencing platform (HiSeq™ X) used for reference transcriptome sequencing (Ginkgo genome: http://gigadb.org/dataset/100209, ).
Identification of lncRNAs
Expression level annotation and DEL screening
We used a reference transcript as a library, and the abundance of the expression of each transcript in each sample was determined by the method of sequence similarity alignment. Bowtie 2  and expression analysis  were used. The transcript expression quantity was calculated using the FPKM method . The number of counts of each sample lncRNA was normalized by DESeq software , and the difference multiple was calculated. The difference in the number of reads was tested by negative binomial distribution test. Finally, the DELs were screened according to the different multiple and differential significance test results.
DEL and DEG coexpression analysis
Transcriptome sequencing assembly and functional annotations were performed according to the methods of Wu et al. (2018) . To identify the DEGs, the two different groups were statistically compared via DESeq software . Specifically, the differential expression were tested via a negative binomial distribution and a shrinkage estimator for the variance of the distribution. The false discovery rate was used as a threshold for the p-value for multiple tests to judge the significance of gene expression differences. A Pearson correlation test was used to determine the correlation between the expression data for the DELs and DEGs. The relationship pairs with a correlation coefficient greater than 0.8 and a p-value less than or equal to 0.05 were considered to have a coexpression relationship. Top50 was used to construct the coexpression network.
Target gene prediction
Because lncRNAs mainly have a cis- or trans-acting function on target genes, lncRNAs were divided into two cases to predict target genes. On the one hand, all protein-coding genes near the lncRNAs in the upstream and downstream 100 kb and significantly coexpressed with the lncRNA were screened as target genes. On the other hand, the target genes of trans-acting lncRNAs were identified by the correlation of expression levels rather than the positional relationships. These lncRNA target genes were functionally annotated using the GO (http://geneontology.org/) and KEGG (http://www.genome.jp/kegg/) databases. Based on the results of the differentially expressed coexpression analysis, lncRNA and mRNA with different chromosomes were selected as candidate targets to extract candidate sequences. The RNA interaction software RIsearch-v2.0 was used to predict the binding of candidate lncRNAs and mRNAs at the nucleic acid level. According to the screening condition, the number of bases between two nucleic acid molecules directly interacting with each other was no less than 10, and the free energy of base binding was no more than 50; the screened lncRNA and mRNA may have direct regulation.
Correlation analysis of lncRNAs and TFs
For each DEL, the coexpressed coding genes were calculated, and the significance of differential mRNA enrichment in each TF entry was calculated using the hypergeometric distribution test method. The result of the calculation returned a p-value that was significant for the enrichment. Then, the intersection of the lncRNA coexpression coding gene set and TF set was calculated. The hypergeometric distribution was used to calculate the enrichment degree of the intersection, and the TFs significantly related to lncRNAs were obtained, thereby identifying the TFs that may play a regulatory role in combination with lncRNAs.
Real-time quantitative PCR validation
We randomly selected 10 DELs from the results of the transcriptional analysis and confirmed them by qRT-PCR. All qRT-PCR experiments were performed on an ABI ViiA 7 Real-time PCR platform (Applied Biosystems, Carlsbad, CA, USA). All reactions were performed in triplicate. The PCR program was performed according to Xu et al. , and the glyceraldehyde-3-phosphate dehydrogenase gene (forward primer [5′-3′]: GGTGCCAAAAAGGTGGTCAT; reverse primer [5′-3′]: CAACAACGAACATGGGAGCAT) was used as a reference gene . All primers for lncRNAs were designed with Oligo v6.0 software and are listed in Additional file 1: Table S1. We normalized the relative expression of the genes with the 2−ΔΔCt method .
GW and FC conceived and designed the project. YW and JG participated in the data analysis. YW drafted the manuscript. TW and GW modified manuscript. All authors read and approved the final manuscript.
This study was supported by the Special Fund for Forest Scientific Research in the Public Welfare (201504105), the Agricultural Science and Technology Independent Innovation Funds of Jiangsu Province (CX(16)1005), the National Key Research and Development Program of China (2017YFD0600700), the Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX18_0954), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). The funding bodies provided the financial support to the research projects (experimental costs and publication fees), but funder didn’t involve in the experiment design, data analysis or preparation of manuscript.
Ethics approval and consent to participate
Plant materials were collected from ginkgo tree in Jiujiang city, Jiangxi Province, China. Sampling was permitted by the Ginkgo Engineering Technology Research Center of the State Forestry Administration.
Consent for publication
The authors declare that they have no competing interests.
- 67.Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map (SAM) format and SAMtools. Transplant Proc. 2009;19:1653–4.Google Scholar
- 73.Anders S, Huber W. Differential expression of RNA-Seq data at the gene level–the DESeq package. European Molecular Biology Laboratory: Heidelberg, Germany; 2012.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.