Introduction

The golden camellia, often referred to as the "flora panda" and the "Camellia queen," is a renowned ornamental plant (Wei et al. 2005). This species belongs to the Chrysantha Chang section, the only group within the Camellia genus (Theaceae) that features yellow flowers. The native distribution of golden camellia is concentrated in the Guangxi Zhuang Autonomous Region in Southern China and Northern Vietnam (Su and Mo 1988). Over time, this species has been introduced to other provinces in China, including Guangdong, Fujian, Zhejiang, Yunnan, and Hunan, as well as to various international locations such as Japan, Australia, Europe, and North America, where successful domestication, cultivation, and breeding have been achieved (Wu et al. 2020). Owing to their unique yellow-flower characteristics, golden camellia species are regarded as highly valuable for breeding new Camellia varieties. Through long-term efforts in breeding programs, significant advancements have been made in utilizing golden camellia as a breeding material (Li et al. 2019a; Huang et al. 2021b; Liao et al. 2022; Liu et al. 2022b). Notably, Chinese researchers have successfully developed well-known Camellia hybrids, including "Zhenghuangqi," "Huangxuanlv," and "Xinshiji" (Liu et al. 2022b). Additionally, golden camellia species are abundant in various beneficial secondary metabolites, such as flavonoids and polyphenols, which make their yellow flowers and emerging scale leaves highly sought after for herbal tea production, contributing to their significant commercial value (Lin et al. 2010; Zhao et al. 2022). However, despite these achievements, further progress in breeding, research, and commercial exploitation of these species has been limited by the lack of understanding of the molecular mechanisms governing yellow flower formation and the rarity of the golden camellia.

Flavonoids are a class of natural secondary metabolites in plants, known for their diverse beneficial properties, including antioxidant, anti-inflammatory, immune-boosting, and anti-cancer effects (Faggio et al. 2017). Numerous studies have linked flavonoids and the genes involved in their biosynthesis to the development of yellow pigments in plants. For instance, an early study demonstrated that Arabidopsis thaliana flavanone 3-hydroxylase (F3H) mutants produced yellow seeds (Wisman et al. 1998). In Brassica crops, Ren et al. (2021) showed that yellow seed coat coloration is influenced by the concentration of flavonoid metabolites and the expression of related biosynthetic genes. Similarly, metabolomic analyses of Carthamus tinctorius (Ren et al. 2022) with red and yellow flowers, and C. reticulata (Geng et al. 2022) with red, pink, and white flowers, revealed significant differential accumulations of flavonoid metabolites, indicating that flavonoid glycoside biosynthesis plays a dominant role in flower color transitions. Recent studies have further identified transcription factors in C. sinensis (Gao et al. 2024) and C. japonica (Zhang et al. 2024) that regulate flavonoid accumulation by co-expressing with flavonoid biosynthetic genes, thus affecting flower or leaf color. These findings suggest that yellow pigment formation in plants is regulated by a complex network involving multiple metabolites and genes. However, more research is needed to pinpoint the key regulatory elements. In golden camellia species, certain flavonoid components, such as kaempferol-3-O-glucoside, quercetin-3-O-glucoside, quercetin-3-O-rutinoside, and quercetin-7-O-glucoside, are thought to be critical to yellow flower development (He et al. 2018; Li et al. 2019a). Previous transcriptomic studies have been conducted to explore the molecular mechanisms underlying yellow flower formation in golden camellia (Zhou et al. 2017; Li et al. 2018; Liu et al. 2022b; Yu et al. 2024). For example, transcriptome sequencing (RNA-seq) analysis of petals from C. nitidissima and its hybrids provided insights into the regulatory pathways involved in yellow flower formation during hybridization (Liu et al. 2022b). Despite these efforts, most studies have focused solely on transcriptomic data from a limited number of species, particularly C. nitidissima, leaving a gap in understanding the metabolic changes directly associated with yellow pigment production. Addressing this gap requires integrating both transcriptomic and metabolomic approaches.

Metabolomics, the study of metabolic changes, serves as a critical foundation for deciphering metabolic pathways involved in complex biological processes in plants (Fiehn 2002). The ongoing enhancement of plant metabolite databases has significantly improved the detection of common plant metabolites. Among the advanced techniques, the widely targeted metabolome has become highly prominent in plant metabolomics research due to its advantages of high throughput, sensitivity, and extensive coverage (Wu et al. 2020; Zhao et al. 2023; Chen et al. 2023). By integrating multi-omics approaches, including the widely targeted metabolome and RNA-seq, key regulatory metabolites and genes have been uncovered in intricate biological processes such as leaf development in Acer rubrum (Lu et al. 2020), seed development in Zanthoxylum bungeanum (Fei et al. 2020), and flower bud development in Eucommia ulmoides (Qing et al. 2022), offering new insights into plant growth and development mechanisms.

This study focused on C. perpetua, a perennial species from the Chrysantha section and the only golden camellia species that flowers continuously throughout the year (Yang et al. 2021b). Unlike other golden camellia species, C. perpetua produces flowers year-round, allowing for multiple fruit and seed harvests annually. As a result, this unique species presents exceptional breeding material with strong commercial potential. The primary objective of this study was to explore the correlation between metabolite accumulation and gene expression during yellow flower development. Additionally, the research sought to elucidate the molecular mechanisms governing yellow flower formation through a comprehensive analysis of metabolomic and RNA-seq data collected from C. perpetua flowers at various developmental stages.

Materials and methods

Plant materials

Camellia perpetua specimens utilized in this study were cultivated at the golden camellia germplasm resource nursery of the Guangxi Institute of Botany (25°4′14"N, 110°17′58"E). Three 15-year-old plants, chosen as biological replicates due to their consistent flowering patterns observed over the previous two years, were selected for the experiment. Given that June marks the peak blooming period for C. perpetua, flower samples were collected from each plant on a clear June morning in 2022, between 9:00 and 10:00 am. The flowering stages were categorized into five distinct phases: young bud (S1), early bud (S2), yellowing (S3), expansion (S4), and blooming (S5), corresponding to different developmental levels (see Fig. 1a). Floral buds were harvested from four cardinal directions (east, south, west, and north) of each plant. Due to the smaller bud size at stages S1 and S2, at least ten samples were gathered, while a minimum of four samples were collected at later stages. All samples were flash-frozen in liquid nitrogen and stored at -80 °C for subsequent analysis.

Fig. 1
figure 1

Metabolomic Profiling Across Five Stages of Flower Development. a Five distinct developmental stages of the flower. b Principal component analysis results. QC, quality control. c A total of 1,160 metabolites identified. "Class" refers to metabolite classification, while "Sub-class" denotes groups of metabolites with similar trends, as identified through K-means clustering. "Up" and "Down" indicate metabolites that were up- or down-regulated, respectively, across comparisons. d Standardized relative abundance of metabolites across two Sub-classes, with total values reflecting metabolite counts. e Venn diagram showing the overlap of differentially accumulated metabolites (DAMs) across four comparative analyses. f Secondary classification (Class II) of 55 flavonoid metabolites out of the 163 shared DAMs from (e), along with their relative accumulation in different sample sets. A higher Z-score corresponds to a greater standardized relative concentration of DAMs

Widely targeted metabolomic profiling

To ensure consistency, floral bud tissues from each plant at the same developmental stage were pooled and freeze-dried, yielding a total of 15 composite samples for further analysis. For metabolite extraction, 50 mg of freeze-dried sample powder was suspended in 1.2 mL of 70% methanol, followed by vortexing for 30 s at 30-min intervals over six cycles. The mixture was then centrifuged at 12,000 rpm for three minutes to collect the supernatant. Metabolites were analyzed using an Ultra Performance Liquid Chromatography (UPLC) system (SHIMADZU Nexera X2) coupled with a mass spectrometry (MS/MS) platform (Applied Biosystems 6500 Q TRAP). The UPLC system utilized an Agilent SB-C18 column (1.8 µm, 2.1 mm × 100 mm), with ultrapure water containing 0.1% formic acid as mobile phase A and acetonitrile with 0.1% formic acid as mobile phase B. The flow rate was set to 0.35 mL/min, with the column maintained at 40 °C, and an injection volume of 2 μL. MS was performed under the following conditions: an electrospray ionization temperature of 500 °C, ion spray voltages of 5500 V for positive mode and -4500 V for negative mode, and gas pressures of 50 psi (gas I), 60 psi (gas II), and 25 psi (curtain gas), with high collision-activated dissociation. A quality control (QC) sample, consisting of an equal mixture from the 15 samples, was analyzed every 15 runs to monitor reproducibility. Both qualitative and quantitative metabolite analyses were conducted using Analyst v1.6.3, referencing a database developed by Metware Biotechnology.

Differentially accumulated metabolite analysis

Principal component analysis (PCA) was applied to all detected metabolites across the 15 samples using the R function prcomp (www.r-project.org). Differentially accumulated metabolites (DAMs) between developmental stages were identified through orthogonal partial least squares-discriminant analysis (OPLS-DA), with selection criteria set at variable importance in projection (VIP) > 1 and |log2 fold change|> 1. To explore the trends of DAMs during flower development, metabolite content was first standardized using the Z-score method, followed by K-means clustering analysis via the R function k-means. The optimal number of clusters was determined using the Elbow method. Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of DAMs was conducted through Metware Cloud (https://cloud.metware.cn/), with a significance threshold of P < 0.05.

RNA extraction and transcriptome sequencing

Total RNA was isolated from the 15 samples using the EASYspin Plus Plant RNA Kit (Aidlab, China). RNA-seq libraries were prepared using the NextUltra™ RNA Library Prep Kit for Illumina (NEB, USA), followed by paired-end (150 bp) sequencing on the HiSeq™ 2000 platform (Illumina, USA).

Differentially expressed gene analysis

Raw sequencing reads were processed with Trimmomatic v0.36 (Bolger et al. 2014) to remove adaptors and low-quality sequences. Clean reads were then mapped to the C. sinensis reference genome (version GCA_013676235.1, https://www.ncbi.nlm.nih.gov/) using HISAT2 (Kim et al. 2015). Novel gene predictions were conducted with StringTie v2.0 (Pertea et al. 2015). Gene expression levels were quantified and visualized as fragments per kilobase per million reads (FPKM). Differences between and within sample groups were analyzed via PCA. Differentially expressed genes (DEGs) between groups were identified using the R package DESeq2 (Love et al. 2014), with thresholds set at a false discovery rate (FDR) < 0.05 and |log2 fold change|> 1. DEGs were annotated through DIAMOND (Buchfink et al. 2015) using the NR, Swiss-Prot, TrEMBL, and KOG databases, with an E-value cutoff of < 1e-5. K-means clustering was applied to DEGs following the same procedure used for DAMs. Gene Ontology (GO) and KEGG pathway analyses of DEGs were performed through Metware Cloud with an FDR threshold of < 0.05. Transcription factors (TFs) and transcriptional regulators (TRs) associated with DEGs were predicted using iTAK (Zheng et al. 2016).

Integrated metabolomic and transcriptomic analyses

Pearson's correlation coefficients (R) were calculated between DEGs and DAMs for each comparison group. To identify biological pathways jointly influenced by DAMs and DEGs, shared pathways were determined from their respective KEGG enrichment analyses. The flavonoid biosynthetic pathway, one of the shared pathways, was specifically plotted. Genes annotated to key enzymes or proteins involved in the flavonoid biosynthesis pathway through DIAMOND were defined as key DEGs. One-way analysis of variance (ANOVA) was conducted to assess variations in the expression levels of these key DEGs across different stages of flower development. To further elucidate the regulatory network of flavonoid biosynthesis, DAMs highly correlated with key DEGs (|R|> 0.8) were screened. Metabolites involved in both the flavonoid synthesis pathway and the regulatory network were classified as key DAMs. Additionally, weighted gene co-expression network analysis (WGCNA) was performed using the R package WGCNA (Langfelder and Horvath 2008), following standard protocols. Pearson's correlation coefficients were calculated for gene modules and several key DAMs, with the top 20 weighted genes in each module designated as hub genes.

Quantitative real-time polymerase chain reaction for gene expression

To validate the findings, quantitative real-time polymerase chain reaction (qRT-PCR) assays were conducted for five randomly selected DEGs and five hub genes related to the flavonoid pathway. The internal reference gene, glyceraldehyde-3-phosphate dehydrogenase (GAPDH), was selected as previously described (Yin et al. 2016). Specific primer sequences were designed using Primer Premier v5.0 (Table S1). cDNA synthesis was carried out using the MightyScript First Strand cDNA Synthesis Master Mix (Sangon Biotech Co., Ltd., China) following the manufacturer's instructions. The qRT-PCR reactions were performed in 20 μL volumes using TB Green Premix Ex Taq II (TaKaRa Biotechnology Co. Ltd., China) and analyzed with Rotor-Gene Q Series software. The cycling program included an initial denaturation at 95 °C for 30 s, followed by 45 cycles of 95 °C for 5 s and 60 °C for 30 s, ending with a melt-curve analysis. The relative expression levels were calculated using the 2(−ΔΔCt) method (Livak and Schmittgen 2001), with all reactions conducted in triplicate.

Results

Metabolomic profiling

A total of 1,160 metabolites were identified using UPLC-MS/MS across 15 samples representing five distinct developmental stages of C. perpetua flowers (Fig. 1a, Table S2). PCA revealed a clear separation of metabolite profiles among these stages, with PC1, PC2, and PC3 accounting for 34.94%, 20.95%, and 10.88% of the total variance, respectively (Fig. 1b). OPLS-DA scores further confirmed significant differentiation among samples from various flowering stages (Fig. S1). The detected metabolites were categorized into 15 primary classes, with flavonoids (21.6%) being the most prevalent, followed by phenolic acids (19.1%) and lipids (9.6%) (Fig. 1c). These results indicate that flavonoid and phenolic acid metabolisms are key processes during flower development. K-means clustering analysis classified the metabolites into two sub-classes with distinct patterns. Sub-class 1, comprising 212 metabolites, displayed an increasing trend followed by a decrease during development, while Sub-class 2, consisting of 340 metabolites, showed a continuous upward trend (Fig. 1c, d). Using S1 as the control, the comparison of S1 with S2, S3, S4, and S5 revealed 295, 350, 365, and 391 DAMs, respectively, with the majority of these DAMs being upregulated (Fig. 1c, Table S3). KEGG pathway enrichment analysis of DAMs indicated that linoleic acid metabolism (ko00591) was the most significant pathway in both S1 vs. S2 and S1 vs. S3 (Fig. S2). However, flavone and flavonol biosynthesis (ko00944) and flavonoid biosynthesis (ko00941) emerged as the most enriched pathways in S1 vs. S4 and S1 vs. S5, respectively (Fig. S2). These results suggest that linoleic acid metabolism plays a central role during the early stages of flower development, with a subsequent shift towards flavonoid biosynthesis as the flowers mature. A total of 163 shared DAMs were identified across the four comparisons, with flavonoids (55) constituting the largest group, significantly outnumbering other metabolite classes (Fig. 1e). A more detailed breakdown of the 55 key flavonoid DAMs (Class II) revealed 25 flavonols, 19 flavones, four flavanones, three chalcones, two flavonols, and two other flavonoid types (Fig. 1f). These data emphasize the critical role of flavonoids, particularly flavonols and flavones, in the developmental processes of C. perpetua flowers.

Transcriptomic analysis

After processing the raw sequencing data, a total of 697,058,956 clean reads (104.56 G) were obtained, with average Q20, Q30, and GC content values of 97.28%, 92.56%, and 45.22%, respectively (Table S4). The mapping rate of individual samples to the C. sinensis reference genome ranged from 73.14% to 77.33%. In total, 51,763 genes (FPKM > 0) were expressed across the 15 samples, comprising 27,384 known genes and 24,379 putative novel genes, with 9,767 genes (FPKM > 0) being expressed in all samples (Fig. 2a). Principal component analysis (PCA) revealed high consistency in gene expression patterns among the biological replicates (Fig. 2b).

Fig. 2
figure 2

Gene Expression Profiles Across Five Stages of Flower Development. a Gene expression levels for each sample, measured in FPKM (fragments per kilobase per million reads). b Principal component analysis illustrating sample variability. c Venn diagram representing the overlap of differentially expressed genes (DEGs) across four comparative analyses. d Expression patterns and trends of DEGs over the developmental stages. e Identification of transcription factors (TFs) and transcriptional regulators (TRs)

A total of 21,152 DEGs were identified across the comparisons between S1 vs. S2, S1 vs. S3, S1 vs. S4, and S1 vs. S5, with 5,473, 10,403, 14,310, and 17,003 DEGs detected in each comparison, respectively (Fig. S3). The percentages of upregulated and downregulated DEGs were similar across the four comparisons, with individual DEGs ranging from 537 to 4,642 (Fig. 2c). K-means clustering analysis grouped the 21,152 DEGs into four sub-classes, each displaying distinct expression patterns (Fig. 2d). Sub-class 1 (4,761 DEGs) and Sub-class 4 (10,167 DEGs) showed progressive increases and decreases in expression, respectively, as the flowers matured. In contrast, Sub-class 2 (3,562 DEGs) peaked at S4, while Sub-class 3 (2,642 DEGs) exhibited maximum expression at S2. GO enrichment analysis of the DEGs highlighted significant involvement in processes such as the mitotic cell cycle (GO:1,903,047), photosystem I (GO:0009522), anchored component of membrane (GO:0031225), and mitotic cell cycle phase transition (GO:0044772) (Fig. S4). KEGG pathway analysis further revealed that DEGs related to photosynthesis were predominant during the early stages of flower development, while genes associated with flavonoid biosynthesis showed progressive increases as the flowers matured (Fig. S5). iTAK predictions identified 902 TFs from 55 families and 176 TRs from 19 families among the DEGs (Fig. 2e). To validate the RNA-seq data, qRT-PCR was performed on ten randomly selected genes, and the relative expression levels were generally consistent with the RNA-seq results, confirming the reliability of the transcriptome analysis (Fig. S6).

Integrated metabolomic and transcriptomic analyses

Integrated metabolomic and transcriptomic analyses revealed substantial Pearson correlation coefficients (|R|> 0.8) between metabolites and genes in the comparisons of S1 vs. S2, S1 vs. S3, S1 vs. S4, and S1 vs. S5 (Fig. S7). These results suggest that changes in metabolite accumulation may be either directly or indirectly regulated by the corresponding genes. KEGG pathway analysis of DAMs and DEGs identified two significant shared pathways: flavonoid biosynthesis (ko00941) and propanoate metabolism (ko00640), both with P-values less than 0.05 (Table S5). These pathways may play pivotal roles in the floral development of C. perpetua, with flavonoid biosynthesis (ko00941) being of particular interest due to the central role flavonoids play in C. perpetua flower development, especially in the formation of yellow flowers.

Genes and metabolites involved in the flavonoid biosynthesis pathway

After eliminating duplicate names, a total of 19 metabolites and 140 genes were identified as part of the flavonoid biosynthesis pathway (ko00941). Functional annotation of the 140 genes revealed that 50 of them encode 11 key enzymes or proteins involved in the flavonoid biosynthetic pathway, and these 50 were thus designated as key genes (Fig. 3a). One-way ANOVA revealed significant expression differences (P < 0.01) in 49 of these key genes across different developmental stages, with the exception of C. perpetua flavonoid 3',5'-hydroxylase (F3′5'H) 4 (Table S6). Additionally, Pearson correlation analysis (|R|> 0.8) identified 713 metabolites strongly associated with these 50 key genes (Fig. 3b). Among these, 17 metabolites were located within the flavonoid biosynthesis pathway (ko00941) and were subsequently classified as key DAMs (Fig. 3b and Table S7). These results further highlight the critical role of flavonoid metabolism in the developmental progression and yellow flower formation in C. perpetua.

Fig. 3
figure 3

Flavonoid Biosynthesis Pathways in C. perpetua Flowers. a Key genes and metabolites involved in the flavonoid biosynthesis pathway. Enzymes or proteins encoded by the genes identified in this study are shown in blue. Differentially expressed genes (DEGs), highlighted in red, represent the hub genes identified in Fig. 4c. Differentially accumulated metabolites (DAMs), also in red, are utilized for co-expression network analysis in Fig. 4a. Gene expression levels are quantified in FPKM (fragments per kilobase per million reads). b Correlation analysis between key genes and 713 metabolites, including 17 key metabolites (Table S7) involved in the flavonoid biosynthesis pathway. Red lines indicate positive correlations, while blue lines represent negative correlations. PAL, phenylalanine ammonia lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumaroyl CoA ligase; CHS, chalcone synthase; CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3’H, flavonoid 3’-hydroxylase; F3′5’H; flavonoid 3’,5’-hydroxylase; FLS, flavonol synthase; DFR, dihydroflavonol 4-reductase; ANS, anthocyanidin synthase; LAR, leucoanthocyanidin reductase; ANR, anthocyanidin reductase

Identification of hub genes in the flavonoid synthesis pathway by WGCNA

Among the 17 key DAMs, six metabolites—chlorogenic acid (mws0178), luteoforol (Lmgp004167), hesperetin-7-O-glucoside (Lmzp002365), myricetin (mws0032), vitexin (mws0048), and tricetin (mws0920)—were consistently identified across four different comparisons (Fig. 1e). To further investigate the regulatory networks associated with these key metabolites, co-expression network analysis was performed using WGCNA. A total of nine gene modules were identified at a soft threshold of 18, based on the top 1,000 genes with the highest mean absolute deviation (Fig. S8 and Fig. 4a). Correlation analysis indicated that the magenta module showed significant correlations with all six metabolites. Notably, it displayed negative correlations with most metabolites (R ranging from -0.80 to -0.64, P < 0.01), except for chlorogenic acid, which had a positive correlation (R = 0.77, P < 0.01) (Fig. 4b). Hub genes for each module were identified based on gene weight, with 20 hub genes selected per module (Table S8). Of particular significance, the black module contained 20 hub genes, eight of which are recognized as being involved in the flavonoid biosynthesis pathway (Fig. 3a and Fig. 4c). This suggests that genes in the black module may have a direct role in regulating flavonoid production. KEGG pathway analysis of the black module further highlighted its critical involvement in the flavonoid biosynthesis pathway (Fig. 4d).

Fig. 4
figure 4

Co-Expression Network Analysis of Genes and Metabolites. a A hierarchical cluster tree illustrates nine gene co-expression modules, distinguished by color. The accompanying heat map depicts the correlation between genes and six key metabolites. b Correlation analysis between the gene modules and target metabolites. c Identification of 20 hub genes within the black module, with genes highlighted in red corresponding to those identified in Fig. 3a. d Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of genes in the black module, with pathways marked in red indicating significance at a false discovery rate (FDR) < 0.05

Discussion

Metabolomic profiling changes during the flower development of C. perpetua

The development of plant flowers is intricately linked to the accumulation of metabolites that drive alterations in flower color, morphology, and fragrance (Abbas et al. 2021; Gao et al. 2022; Ren et al. 2022). In this study, a comprehensive UPLC-MS/MS analysis was employed to investigate the metabolites in C. perpetua flowers, leading to the identification of 1,160 distinct metabolites (Fig. 1c). Notably, flavonoids accounted for the largest class, with 250 metabolites (21.6%), which aligns with the recognized flavonoid abundance in golden camellia herbal tea products (Zhao et al. 2022). The number of flavonoid metabolites identified significantly exceeds that reported in previous studies of other golden camellia species (Li et al. 2019a, b; Mo et al. 2020), highlighting the enhanced detection capability of widely targeted metabolomics. Nonetheless, the overall metabolite count in C. perpetua flowers was lower than that reported for the leaves of C. limonia and C. nitidissima in a previous metabolomic study (Liu et al. 2022a), suggesting substantial variation in metabolite composition between species or tissue types. Consequently, metabolomic profiling of other tissues (e.g., leaves) may offer further insights into the dynamic changes in metabolites during the growth and development of C. perpetua.

Flavonoids, as natural bioactive compounds, are abundant in several Camellia species, such as tea (C. sinensis), oil tea (C. oleifera), and golden camellia (C. nitidissima) (Li et al. 2019a; Lv et al. 2022; Song et al. 2022b; Jiang et al. 2024). Previous studies have suggested a potential link between certain flavonoid constituents and yellow flower formation in golden camellia species (He et al. 2018; Li et al. 2019a). However, since not all known flavonoid components were detected in C. perpetua, the development of yellow flowers in golden camellia species is likely not governed by a few specific metabolites but rather by the complex interplay of numerous metabolites. In line with this, a recent study on C. oleifera demonstrated a continuous increase in flavonoid and phenolic acid content during flower maturation (Huang et al. 2021a). In C. perpetua flowers, phenolic acids represented the second-largest class of metabolites after flavonoids, and their levels were predominantly up-regulated as the flowers matured (Fig. 1c), suggesting a possible indirect role of phenolic acids in yellow flower development through alternative pathways. Despite this, the content, composition, and biological roles of phenolic acids in golden camellia species remain underexplored. The metabolomic comparison across five developmental stages of C. perpetua flowers revealed 552 differentially accumulated metabolites (DAMs), with 163 showing significant differences across all four comparisons (Fig. 1e). These results offer key insights into the chemical basis underlying yellow flower development in golden camellia species.

Gene expression variations during C. perpetua flower development

Comparative transcriptome analysis across different stages of C. perpetua flower development revealed a progressive increase in DEGs, rising from 5,473 in S1 vs. S2 to 17,003 in S1 vs. S5 (Fig. 2c). This suggests that a growing number of DEGs participate in transcriptional regulation as the flower matures. These findings align with previous studies (Zhou et al. 2017; Li et al. 2018). However, Zhou et al. (2017) observed that genes exhibiting expression changes across all five developmental stages in C. nitidissima were relatively rare, whereas the present study identified 3,324 DEGs common across the four comparisons in C. perpetua (Fig. 2c). This discrepancy may be attributed to the unique gene transcription patterns exhibited by multi-season flowering C. perpetua, which may regulate yellow flower development differently compared to single-season flowering golden camellia species (Jia et al. 2014; Yang et al. 2021a).

Among the identified DEGs, TFs and TRs were notably abundant (Fig. 2e). For instance, the basic Helix-Loop-Helix (bHLH) TF family showed the highest DEG enrichment. Members of the bHLH family are critical in regulating plant growth and developmental processes. In a previous quantitative trait locus (QTL) mapping study of an F1 C. sinensis population, bHLH-encoding candidate genes were identified in QTLs associated with flavonoid traits (Xu et al. 2018). More recently, bHLH genes (bHLH3 and bHLH18) were shown to regulate flavonoid accumulation in C. oleifera seeds by modulating the transcription of structural genes such as chalcone synthase (CHS), chalcone isomerase (CHI), F3H, flavonol synthase (FLS), anthocyanidin synthase (ANS), dihydroflavanol 4-reductase (DFR), leucocyanidin reductase (LAR), and anthocyanidin reductase (ANR) (Song et al. 2022b). Similarly, dual-luciferase assays and transgenic experiments in C. japonica demonstrated that the co-expression of bHLH1 with MYB114 activates the promoter of C. japonica DFR, leading to enhanced anthocyanin accumulation (Zhang et al. 2024). These results suggest that certain bHLH DEGs may play analogous roles in C. perpetua flower development.

The Far-Red Impaired Response (FAR) TF family was the second most enriched group of DEGs after bHLH (Fig. 2e). FAR is involved in chloroplast division, chlorophyll biosynthesis, light signal transduction, hormone response, and pollen germination during plant development (Ruckle et al. 2007; Tang et al. 2013; Chen et al. 2021). Given the close association of these processes with photosynthesis, FAR may indirectly regulate C. perpetua flower development by modulating photosynthetic pathways. In summary, functional analysis of the DEGs revealed that genes related to photosynthesis and flavonoid biosynthesis play pivotal roles in regulating C. perpetua flower development. Specifically, genes involved in photosynthesis predominantly influence early-stage flower development, while those associated with flavonoid biosynthesis contribute to the regulation of mid-to-late developmental stages.

Integrated metabolomic and transcriptomic analyses reveal the pathways involved in the flavonoid biosynthetic in C. perpetua flowers

This study represents the first comprehensive exploration of the developmental pathways leading to yellow flowers in golden camellia species using widely targeted metabolomic and transcriptomic analyses. A schematic model summarizing the molecular mechanisms underlying yellow flower development in C. perpetua is presented, with a focus on key flavonoid biosynthesis genes (Fig. 5). In this pathway, FLS plays a pivotal role in the synthesis of kaempferol and quercetin (Ashihara et al. 2010). Overexpression of the C. sinensis FLS gene in tobacco has been shown to significantly increase kaempferol levels (Jiang et al. 2020), while suppression of this gene in C. sinensis results in a marked reduction of flavonol concentrations (Song et al. 2024). Liu et al. (2022b) identified elevated expression levels of two FLS genes as a possible promoter of yellow flower formation in C. nitidissima. In a similar vein, the current study uncovered six FLS genes within the flavonoid biosynthesis pathway of C. perpetua, including CperFLS2, which was identified as a hub gene in the WGCNA module black (Fig. 3a, Fig. 4c), suggesting its critical role in flavonoid biosynthesis in C. perpetua.

Fig. 5
figure 5

Schematic Model of the Molecular Mechanisms Underlying Yellow Flower Development in C. perpetua. CHI, chalcone isomerase; F3H, flavanone 3-hydroxylase; F3’H, flavonoid 3’-hydroxylase; FLS, flavonol synthase; DFR, dihydroflavonol 4-reductase; ANS, anthocyanidin synthase

The module black hub genes also include two CHI genes (CperCHI2, CperCHI5), two F3H genes (CperF3H1, CperF3H2), one flavonoid 3'-hydroxylase (F3'H) gene (CperF3’H1), one DFR gene (CperDFR1), and one ANS gene (CperANS4). Several of these genes have been frequently reported in studies of the flavonoid biosynthesis pathway in C. sinensis. For instance, QTL mapping identified a crucial CHI gene involved in the early stages of flavonoid biosynthesis (Xu et al. 2018). During C. sinensis leaf development, Wu et al. (2019) observed a similar transcriptional regulation pattern for CHI and FLS using RNA-seq, where both genes exhibited decreased expression as maturity progressed. Additionally, WGCNA and in vitro treatments indicated that sugar-induced regulation of flavonoid biosynthesis in C. sinensis might involve C. sinensis F3’H as a target gene encoding the ethylene-responsive factor-like TF (Lv et al. 2022).

Metabolomic and transcriptomic analysis of five different tissues from C. lanceoleosa revealed that flavonoids were the predominant compounds, with differentially expressed structural genes such as CHS, DFR, and ANR being identified as hub genes in the flavonoid biosynthesis pathway (Song et al. 2022a). A key characteristic of the eight hub genes in the C. perpetua flavonoid biosynthesis pathway (module black) was their significantly higher expression compared to other genes within the same family (Fig. 3a). Notably, the expression of these eight hub genes peaked during the early bud stage (S2) or the yellowing stage (S3), consistent with previous findings that genes involved in flavonoid biosynthesis displayed high expression levels from S1 to S4 and decreased expression at S5 in C. nitidissima (Zhou et al. 2017). Despite Li et al. (2018) categorizing flower development in C. nitidissima and C. chuongtsoensis (synonym of C. perpetua) into only four stages, the expression trends of flavonoid biosynthesis genes were consistent with those observed in this study. In summary, the metabolomic and transcriptomic evidence strongly suggests that certain flavonoid biosynthesis genes may promote yellow flower development in C. perpetua via a conserved mechanism of high expression, particularly during the early bud and yellowing stages, across golden camellia species. Although genes involved in the carotenoid biosynthesis pathway are believed to play a significant role in yellow flower formation in golden camellia species (Zhou et al. 2017), these metabolites were not included in our metabolomic assay. As such, cautious interpretation of the current findings, along with more extensive metabolite profiling and further gene function validation, is essential for a comprehensive understanding of the molecular mechanisms driving yellow flower development.

Conclusions

A temporal map detailing the metabolomic and transcriptomic processes during flower development in the rare continuously flowering C. perpetua has been constructed. Through integrated analysis across five developmental stages, 552 DAMs and 21,152 DEGs were identified. This study highlights the critical regulatory function of the flavonoid biosynthesis pathway in yellow flower development, revealing a complex interaction network involving 50 key genes and 17 key metabolites. The elucidation of these molecular mechanisms provides a robust scientific foundation for advancing molecular breeding efforts and optimizing the utilization of golden camellia species.