Introduction

Soybean [Glycine max (L.) Merr.] is a short-day plant domesticated about 5000 years ago in temperate region of China at approximately 35°N latitude (Hymowitz, 1970; Carter et al. 2004). Nowadays, soybean is one of the important field crops, providing dietary proteins, and edible oil worldwide, and the demand for soybean production in tropical regions is rising but the available high-yielding soybean varieties originally bred for middle and high latitudes perform poorly in tropical regions because of inadequate adaptation. When temperate elite soybean varieties are grown in the tropics, a majority of them bloom nearly 3 weeks after emergence leading to low yields (Destro et al. 2001). Short day (SD) (~ 12 h) leads to a significant reduction of the soybean vegetative growth. The short stature and a low leaf area index when the vegetative stage of development is stunted leads to reduced yields (Sinclair and Hinson 1992). Therefore, there is a need for sustained research and development aiming at adapting high yielding temperate varieties to tropical climates to increase global soybean production since the demand keeps rising.

The long juvenile (LJ) trait which delays flowering in soybean under SD conditions was discovered and has resulted in a major breakthrough in the expansion of soybean commercial production at latitudes below 22° (Hartwig and Kiihl 1979; Neumaier and James 1993; Spehar 1995). An assessment of the genetic control of the LJ trait in soybean by Destro et al. (2001) reported that up to five genes can control the LJ trait and that recessive alleles at two loci are necessary for the LJ trait. Studies by Yue et al. (2017) and Lu et al. (2017) concluded that J is the dominant functional allele of GmELF3a, an ortholog of ELF3 in Arabidopsis, which inhibits flowering, while j is the recessive, loss-of-function allele that causes delayed flowering and confers the LJ trait that is associated with the adaptation of soybean to tropical regions. In both studies, the J gene was confirmed by transgenic complementation of the GmELF3a gene in a LJ variety. A recent report by Sun et al. (2019) speculated that the LJ trait could also result partly from a combination of various dysfunctional alleles at GmFT2a loci. FLOWERING LOCUS T (FT) is an important flowering integrator in Arabidopsis, with homologs in many other plants. The manipulation of FT homologs in soybean can facilitate the creation of LJ soybean germplasm (Cai et al. 2020; Li et al. 2020a, b).

Numerous studies with the model plant Arabidopsis enhanced the understanding of the molecular mechanisms of flowering in many plants. Many of these studies have identified important genes, such as FT and CONSTANS (CO), and their involvement in flowering (Andres and Coupland 2012; Itoh and Izawa 2013; Pin and Nilsson 2012). Up to date, a lot of genes and loci have been identified to play a vital role in regulating photoperiodic flowering in soybean (Lin et al. 2021). Some of the genes/loci include E1 (Xia et al. 2012), E2 (Watanabe et al. 2011), E3 (Watanabe et al. 2009), E4 (Liu et al. 2008), GmFT2a (Kong et al. 2010; Sun et al. 2011), GmFT4 (Zhai et al. 2014; Samanfar et al. 2017), J/GmELF3a (Lu et al. 2017; Yue et al. 2017), GmFT5a (Kong et al. 2010), GmFT1a/ GmFT1b (Guo et al. 2015; Liu et al. 2018), GmFT2b (Fan et al. 2014; Chen et al. 2020), GmFT2c (Wu et al. 2017), GmFT3a/ GmFT3b (Fan et al. 2014), GmFT5b (Fan et al. 2014), GmFT6 (Wang et al. 2015), GmFT7 (Zhang et al. 2021), PRR3a/ PRR3b (PRR37) (Wang et al. 2020; Li et al. 2020a, b; Lu et al. 2020), GmLux1, and Lux2 (Fang et al. 2021; Bu et al. 2021).

Transcriptome analysis enabled by advances in RNA sequencing (RNA-Seq) and computation biology is a powerful tool to identify gene expression differences and correlations with genetic/developmental cues or environmental conditions (Gillman et al. 2019). RNA-Seq uses deep-sequencing technology that provides a far more precise measurement of the levels of transcripts than other methods (Wang et al. 2009). With the above merits and the low cost nowadays, RNA-Seq has been applied in a lot of studies on flowering in Arabidopsis thaliana and soybeans. In this study, we used RNA-Seq to identify transcriptional variations between HX3, a mutated soybean variety, and the transgenic soybean line overexpressing J gene under SD and long day (LD) condition. This is to help identify the expression of flowering-related genes and the possible pathways which resulted in the delayed flowering in the LJ soybean under SD condition.

Materials and Methods

Soybean Genetic Materials

A natural LJ soybean variety Huaxia-3 and a transgenic line (Joe) overexpressing the J gene were used in the study. The transgenic line was Huaxia-3 transformed with the genome version of the J gene from Zhonghuang-24, a cultivar with conventional juvenility (Yue et al. 2017), and served as the control of the LJ variety. The LJ soybean HX3 was untransformed but has the recessive form of the J gene (j). Each sample for RNA-Seq was collected from the unifoliate leaves (V1 developmental stage) from three randomly selected individual plants, under short- and long-day conditions. All the samples were collected about 9 am and immediately frozen in liquid nitrogen and stored at − 80 °C until RNA extraction. The samples were prepared and coded as HX3L, HX3S, JoeS, and JoeL which refer to Huaxia-3 under long day, Huaxia-3 under short day, transgenic line under short day, and transgenic line under long day, respectively.

Phenotypic Evaluation Under Short- and Long-Day Conditions

The LJ soybean HX3 and the transgenic line were grown under SD (12 h light and darkness) and LD conditions (16 h light and 8 h darkness). The seeds were planted in pots (five plants per pot) on the 16th of May 2019 at the Changping Experimental Station of Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing, China. Under the SD treatment, the seedlings were placed under natural sunshine for 12 h followed by 12 h in darkness from 7:00 AM to 7:00 PM. For LD treatment, daytime was 5:00 AM–9:00 PM (12 h daylight + 4 h artificial light). The experiments were arranged in a complete randomized design with three replications. Flowering time was recorded at the VE-R1 stage (days from emergence to the first open flower) and maturity was recorded at the VE-R7 stage (days from emergence to physiological maturity) as described by Fehr et al. (1971). For phenotypic evaluation, five individual plants were analyzed per each replicate. SPSS software package was used to analyze the phenotypic data (version 19.0; SPSS Institute Ltd., Armonk, NY, USA).

Total RNA Extraction and cDNA Library Construction

Total RNA was isolated using the TRIzol reagent (Invitrogen) according to the manufacturer’s recommendation. RNA degradation and contamination were verified on 1% agarose gels and the purity checked using the Nano Photometer spectrophotometer. The cDNA library construction and sequencing (on an Illumina Hiseq X ten platform, San Diego, CA, USA) were carried out by Novo gene Co., Ltd. (Beijing, China).

Transcriptome Sequencing

The processing of raw reads was accomplished through in-house PERL scripts. This involves pruning raw data by removing some reads containing adaptor sequences, ploy-N-containing reads, and low-quality sequences in order to have clean reads. The quality scores for assessing sequencing accuracy were computed for the clean reads, Q20, Q30, and the GC content. The high-quality clean data was used in all the downstream analyses. The clean reads were mapped to the soybean reference genome sequence using a spliced aligner Tophat 2.0.12 software (Kim et al. 2013). Only reads with a perfect match or one mismatch were further analyzed and annotated based on the reference genome. Add to that, HTSeq v 0.6.1 software (Anders et al. 2015) was used to count the read numbers that were mapped to each gene. The quantification of gene expression levels was estimated by reads per kilobase of transcript per million mapped reads for each gene and log2 transformed (Trapnell et al. 2010). After the alignment of the reads using Cufflinks v2.2.1 (Trapnell et al. 2012), the assembled transcripts were further cleaned using the fragments per kilobase per million mapped reads (FPKM) value > 1. The sequence data sets were deposited at the National Centre for Biotechnology Information (NCBI) Sequence Read Archive (SRA) repository, [http://www.ncbi.nlm.nih.gov/sra with the accession PRJNA663421].

Analysis of the RNA-Seq Data

Differential expression analyses of the LJ soybean HX3 and transgenic line (Joe) (three biological replications per treatments) were performed using the DESeq R package (1.18.0) (Anders et al. 2015). DESeq software enables statistical procedures for determining differential expression in digital gene expression data based on the negative binomial distribution (Wang et al. 2010). The resulting p-values were adjusted using Benjamini and Hochberg’s approach to control the false discovery rate (Benjamini and Hochberg 1995). Genes with an adjusted p-value < 0.05 found by DESeq were assigned as significantly differentially expressed only when the fold change was at least one (> 1 or <  − 1 in log2 fold change value). We used the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (Kanehisa et al. 2008) to analyze the biological pathways of the identified differentially expressed genes (DEGs). Also, the KOBAS 2.0 web server (http://kobas.cbi.pku.edu.cn/) (Xie et al. 2011) was used to test the statistical enrichment of DEGs in the KEGG pathways. Using the hypergeometric test, based on the Student’s t-test, all KEGG pathways that had a p-value < 0.05 were considered to be significantly enriched. Gene Ontology (GO) enrichment analysis of differentially expressed genes was done using the GOseq R package, in which gene length bias was corrected. GO terms with corrected p value less than 0.05 were considered significantly enriched by differential expressed genes.

Real-Time (RT)–Quantitative PCR

Twelve RNA samples for the RNA seq experiment were used in the RT-qPCR experiment. First-strand cDNA was synthesized from 1 µg of the total RNA using a Fast Quant RT Kit (Tiangen Biotech, Beijing, China). RT-qPCR was performed on Quant Studio 7 Flex (Applied Biosystems, Waltham, MA, USA) using KAPA SYBR® FAST qPCR Kits (KAPA Biosystems, Wilmington, MA, USA) Master Mix. Three biological replications and three technical replications were analyzed for each sample. Data were analyzed using the 2 − ∆∆Ct method (Schmittgen and Livak 2008) with the GmActin (Glyma.18g290800) gene used as the reference gene. The primers used for RT-qPCR in this study are listed in Table S1.

Results

Phenotypic Analysis of the Long Juvenile (LJ) Soybean and Complementary Transgenic Line Under Different Photoperiods

There were significant differences (p ≤ 0.05) between the LJ soybean variety Huaxia-3 (HX3) and Joe (J overexpression), a complementary transgenic line in days to flowering and maturity (Table 1). The LJ soybean had delayed flowering and maturity for about 10 and 6 days respectively later than the transgenic line under SD condition. Furthermore, under LD condition, both the LJ and the transgenic line did not flower up to 90 days after emergence. Thus, the LD experiment was terminated because of the onset of frost in autumn.

Table 1 Flowering and maturity of long juvenile and transgenic soybean under SD condition

RNA-Seq Analysis

Twelve samples comprising of LJ and transgenic soybeans under SD and LD conditions with 3 biological replications totaling 36 samples were profiled at the V1 (unifoliate) growth stage. The total number of clean reads generated for the 12 samples in the RNA-Seq experiment was 674,800,494 of which 626,517,161 (96.2%) was uniquely mapped to the soybean reference genome (i.e., Gmax_275) (Table 2). Average GC content was 45.13% and an average of 97.74% of clean reads had quality scores at Q20 and 93.38% at Q30 (Table 2). To be sure of the similarity of the replications, we conducted a Pearson correlation test. The correlation coefficients among the samples were very high, r2 ≥ 0.90 (Fig. 1) indicative of the similarity of the replications.

Table 2 Summary of transcriptome sequencing statistics
Fig. 1
figure 1

Correlation coefficients among the read count of the twelve samples comprising a long juvenile soybean (HX3) and the transgenic (Joe) with 3 biological replications under SD and LD conditions. The lower diagonal represents the correlation coefficient; *** represents significance at p < 0.05. The upper diagonal plots show the distribution for each of the 12 replications.1, 2, 3 refers to the 3 replications

Numbers and Expression Patterns of Differentially Expressed Genes (DEGs) and Transcription Factors in the LJ Soybean Under Different Photoperiods

Fragments per Kilobase Million (FPKM) values were used to standardize the expression levels of the genes to enable the comparison of the LJ soybean to the transgenic line. Figure 2 shows the numbers and the expression patterns of all the DEGs under LD and SD conditions. A total of 31 genes were differentially expressed in the LJ soybean HX3 relative to the transgenic line under SD condition. Eighteen of the DEGs were upregulated and 13 were downregulated (Table S2). Under LD condition, however, 2311 DEGs were observed, comprising of 1065 upregulated and 1246 downregulated genes (Table S3). Nine genes were common under both SD and LD conditions with 22 and 2302 being exclusive to SD and LD, respectively (Fig. 2b). A total of 105 transcription factors (TF) were differentially expressed in the LJ soybean HX3 under both photoperiod and classified into 37 families (Table S4). However, only 2 TFs WRKY and SRM were differentially expressed under SD (Table S5). The commonest of the TFs under LD are bHLH, MYB, ERF, and WRKY (Table S5).

Fig. 2
figure 2

Numbers and expression pattern of differentially expressed genes (DEGs). a Volcano plots of DEGs between the long juvenile (LJ) soybean and the transgenic under short day (SD) (HX3S vs. JoeS). b LJ and transgenic under LD condition (HX3L vs. JoeL). c Venn diagrams showing the DEGs shared between LJ under SD (HX3S vs. JoeS) compared to LJ under LD condition (HX3L vs. JoeL)

Annotation of Differentially Expressed Genes (DEGs) in the LJ Soybean

To identify the genes that are differentially expressed in the LJ soybean HX3, a functional categorization was carried out by GO analysis to identify the potential use of these genes. The GO analysis grouped DEGs into three broad categories, biological function, cellular function, and molecular function. Eighteen DEGs under SD were assigned, and 37 GO terms were over represented with none of them significant at a corrected p ≤ 0.05 (Fig. S1a, Table S6). Under LD condition, however, 1767 DEGs were assigned various GO terms of which 55 were over represented at a corrected p ≤ 0.05. (Fig. S1b, Table S7). Most of the DEGs under LD condition were involved in biological processes (Fig. S1bB).

Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Analysis of the DEGs in the LJ Soybean

A KEGG database resource for identifying high-level functions of biological systems, especially large-scale molecular datasets generated by genome sequencing was used to test the statistical enrichment of DEGs in the KEGG pathway. According to the KEGG analysis, 21 of the DEGs under SD condition were assigned to various pathways. By comparing the LJ soybean HX3 with the transgenic line under SD condition, the top most enriched pathways are shown in Fig. 3a. The circadian pathway and ether lipid metabolism were the most enriched pathway (p ≤ 0.05) (Table S8) and the circadian pathway included six genes. Under LD condition, 1020 out of the 2311 DEGs were assigned to various pathways. The top most enriched pathways are presented in Fig. 3b. Ribosome, protein processing in endoplasmic reticulum, plant hormone signal transduction, sulfur metabolism, linoleic acid metabolism, sesquiterpenoid, triterpenoid biosynthesis, and circadian rhythm were the most enriched pathway (p ≤ 0.05) (Table S9).

Fig. 3
figure 3

Statistics of pathway enrichment between the long juvenile (LJ) and the transgenic soybean. a LJ soybean under LD condition compared to transgenic soybean under SD (HX3S vs. JoeS). b LJ soybean under LD condition compared to transgenic under LD (HX3L vs. JoeL). The left Y-axis indicates the KEGG pathway. The X-axis indicates the Rich factor. The color gradient denotes the size of the q value; the color is from red to green; the red represents the smaller the q value, and the higher the significant level of enrichment of the matching KEGG pathway. The rich factor shows the ratio of the number of DEGs to the total gene number in certain pathways

Effect of Photoperiod on the Expression of Flowering Related GenesInvolved in the Circadian Clock Pathway in the LJ Soybean

Flowering related genes involved in the circadian clock pathway in the transcriptome showed that transcripts associated with soybean circadian rhythm pathway differed significantly between LJ soybean HX3 relative to the transgenic under SD condition. Table 3 and Fig. 4 shows the list and the expression pattern of circadian rhythms DEGs in the LJ soybean under both SD and LD conditions. The circadian clock pathway had one downregulated genes in the LJ soybean HX3 under SD which was the overexpressed gene Glyma.04G050200 (GmELF3a). Two FT homologs, Glyma.16G044100 (GmFT5a) and Glyma.16G150700 (GmFT2a) and two uncharacterized novel genes, Novel02840 and Novel02485 were also differentially expressed (Table 3).

Table 3 Circadian clock flowering related DEGs in the long juvenile soybean relative to the transgenic soybean under SD and LD
Fig. 4
figure 4

Heat map analysis of flowering related genes involved in the circadian clock pathway in the long juvenile (LJ) soybean. The values used to draw heat map are Log2 (fold change) of expression levels of the DEGs. HX3L vs. JoeL refers to LJ soybean under LD compared to the transgenic under LD, HX3S vs. JoeS refers to LJ soybean compared to transgenic soybean under SD

Under LD condition, two circadian rhythm pathway flowering-related genes were downregulated in the LJ soybean HX3. This included Glyma.16G027200 (SPA, SUPRESSOR OF PHYA), and a TF Glyma.19G224700 (PIF3, phytochrome-interacting factor 3) (Table 3). However, seven other circadian flowering related clock genes were upregulated (Table 3). Analysis of flowering related genes involved in the circadian clock pathway in the LJ soybean HX3 under both photoperiods shows that the overexpressed gene in the transgenic (Joe) Glyma.04G050200 (GmELF3a) was downregulated under both photoperiods (Fig. 4) but was not differentially expressed under LD. Figure S2 shows the soybean circadian clock genes in a biological network highlighting the DEGs between the LJ soybean HX3 and the transgenic under SD condition.

Confirmation of Gene Expression Profiles Using RT-qPCR

Eleven flowering related DEGs were selected to confirm the RNA-Seq results. Four out of the five genes under SD condition showed the same trends with the RNA-Seq data. Under LD condition, five out of the six genes showed a similar trend with the RNA-Seq data. Glyma.08G363100 (GmFT4) was downregulated under SD and upregulated under LD condition showing the same trend as the RNA-Seq result. These lend credence to the validity of the RNA-Seq experiment. Nine genes having similar trends as RNA-Seq are presented in Fig. 5.

Fig. 5
figure 5

Comparison of RNA-seq analysis by reverse transcription-quantitative polymerase chain reaction (RT-qPCR). A–D genes under SD condition. D and E are both under short day (SD) and long day (LD) condition. E–I are genes under LD condition

Discussion

Our study focused on the profile of the transcriptome of a natural LJ soybean with the recessive j gene relative to the transgenic soybean overexpressing the J/GmELF3a gene. This gave us an insight into the clock related flowering genes differentially expressed in the LJ soybean HX3 compared to the transgenic soybean under different photoperiods. RNA-Seq is a high-throughput sequencing technology that enables the study of comprehensive transcriptional profile and has been widely used to study molecular responses to abiotic and biotic stresses and for comparing transcriptomes under different treatments (Zeng et al. 2019; Luo et al. 2019; Gillman et al. 2019; Wang et al. 2009).

The LJ soybean had delayed flowering for about 10 days and matured later than the transgenic under SD (Table 1). This is attributed to the recessive allele (j) in the LJ soybean since the functional J gene results in early flowering under SD condition and is in consonance with a previous study by Yue et al. (2017) where transgenic complementation of the J gene in the LJ soybean HX3 resulted in early flowering. This was also confirmed by the down-regulation of the overexpress gene GmELF3a in the HX3 (Table 3). Under LD condition, interestingly, both the transgenic and the LJ soybeans did not flower more than 90 days after emergence, even though the transgenic line had the functional J gene. The ortholog of J gene in Arabidopsis ELF3 has been reported to trigger early flowering under LD condition (Zagotta et al. 1996). The non-flowering of both the transgenic line and the LJ soybean HX3 could be as a result of their genetic background since LJ varieties with either functional or the recessive j gene hardly flower under LD conditions.

We observed more DEGs under LD than SD. The higher number of DEGs under LD compared to SD may be as a result of the differences between the growth and development under the different photoperiods. Soybean is a SD crop; therefore, the LD may have imposed stress on it. It should be noticed that many phenotypes such as flowering day exhibit a small variance under SD than under LD. A study by Jiang et al. (2017) involving wheat also reported few numbers of DEGs under its normal growth conditions compared to salt stress condition.

The KEGG analysis of the transcriptome data revealed that the circadian clock pathway was the most significantly enriched pathway in the LJ soybean under SD condition (Fig. 3a). This is indicative that the circadian pathway played a role in the delayed flowering of the LJ soybean under SD. The circadian clock in plants regulates seasonal changes in day length and temperature to control flowering time. Circadian clock genes have been reported to be involved in regulating the photoperiod of flowering in plants (Imaizumi and Kay 2006; Hudson 2010). This ensures that the transition from vegetative to reproductive stages happens under optimum physiological and environmental conditions, thereby enhancing reproductive success (Piggins et al. 2011).

In Arabidopsis, FT is an important flowering promoter, and also serves as integrators of internal and external flowering indicators and is widely conserved in many plant species (Pin and Nilsson 2012). Two of the downregulated FT homologs in the LJ soybean under SD condition GmFT2a and GmFT5a play significant roles in the induction of flowering under SD. GmFT2a and GmFT5a were reported to cause delayed flowering when downregulated by RNA interference (RNAi) (Guo et al. 2015). Also, recent studies of CRISPR/Cas9-generated loss-of-function mutants of these genes revealed these genes were involved in flowering induction, with a delayed flowering observed under SD for GmFT2a mutants (Cai et al. 2018, 2020). These are in consonance with our study where the down regulation of these two genes seems to trigger delayed flowering in the LJ soybean.

Contrarily to most florigen proteins, GmFT4 and GmFT1a inhibit flowering (Liu et al. 2018). GmFT4 was slightly upregulated in the LJ soybean under SD condition. GmFT4 was reported as a flowering repressor by previous studies and a candidate for the E10 gene in soybean (Samanfar et al. 2017). A study by Zhai et al. (2014) reported that the LJ soybean HX3 showed a relatively high GmFT4 expression under SD condition. This aligns with our study in which the GmFT4 was upregulated under SD in the LJ soybean. Under LD, only two FT genes were differentially expressed in the LJ soybean HX3, and this includes GmFT4 which was slightly downregulated and GmFT1b which was slightly upregulated (Table 3). GmFT1a and GmFT4 were reported to be strongly induced under LD conditions (Liu et al. 2018; Zhang et al. 2016).

The other circadian clock genes such as PRR7, PRR5, CRY1, CRY1a, and Novel 03,244 (Gigantea like protein) were slightly upregulated in the LJ soybean HX3 under LD (Table 3). PRRs are important parts of transcriptional circadian networks in plants (Farre and Liu 2013). Li et al. (2020a, b) reported that the PRR3 and its homologue PRR7 may be involved in flowering regulation through the circadian pathway. CRY is the photolyase-related blue-light receptors that regulate plant’s responses to light and the circadian clock major evolutionary lineages (Schmitz et al. 2005). During the shift from the vegetative to reproductive phases, the biological circadian clock genes PRR5, PRR7, and SPA1, act on the downstream gene CRY2 to affect the flowering process in Arabidopsis (Nakamichi et al. 2010; Weidler et al. 2012). CRY2 regulates photoperiodic flowering in Arabidopsis; however, in soybean, CRY1a is reported to be one of the major regulators of photoperiodic flowering in soybean (Jung et al. 2012). GIGANTEA protein in Arabidopsis was reported to play multiple roles in the circadian clock and flowering (Watanabe et al. 2011) but its role in soybean has not been fully understood.

Conclusion

We compared changes in the expression of circadian clock flowering related genes in a natural LJ soybean variety with a loss of function allele for the J gene and a transgenic line with the overexpressed gene under different photoperiods. Analysis of the RNA-Seq data shows that the delayed flowering under SD may be as a result of the down-regulation of GmELF3a together with some key FT genes in the LJ soybean under SD condition. Our study adds to the evidence emerging that FT genes may also be implicated in the LJ trait in soybean under SD. Future studies should explore the non-flowering of both the LJ soybean HX3 and transgenic line under LD condition to identify the genes involved. This will contribute to the knowledge gap in flowering time regulation in soybean under tropical conditions.