Background

Asian cotton (Gossypium arboreum L.) is one of the significant sources of natural fiber, initially introduced to China from the Indian subcontinent during the twelfth century [1]. It possesses resilience against biological and environmental stresses and is also a natural source of genetic variation for fiber-related traits [2,3,4,5]. Asian cotton has a broad spectrum of phenotypic variation related to pigmentation. The petals are often yellow, yellow–red, purple, or white with and without purplish-red basal spots [6]. There is a long history of research into the color of cotton flowers [7]. The flower color of cotton has also become a tool for genetic and taxonomic studies of cotton [8,9,10]. Brightly colored petals attract insects like bees and butterflies [11], increasing pollen dispersal, increasing heterosis, and enriching germplasm resources. Accordingly, it is well known that hybridization produces abundant variation. Hybrid cotton has higher fiber yield than inbred cotton [12, 13]. However, the current cotton hybrid breeding is limited by the low pollination rate of natural hybridization [12, 14]. Insect pollination is an effective way of cotton hybrid breeding. Flower color is an important phenotypic trait affecting insect pollination [15, 16]. It is an ideal genetic improvement strategy to generate cotton varieties with highly attractive for pollinator. Moreover, colorful flowers are a significant addition to the horticultural industry [17]. Furthermore, Asian cotton, as a diploid closely related to the At genome of upland cotton, has a high-quality gene source. It is relatively easier to realize the characterization of genetic mechanisms than the allotetraploid with complex genomes. Therefore, exploiting cotton flower color has far-reaching implications for the diversity of cotton germplasm resources.

Previously published reports suggested varying levels of co-accumulation of secondary metabolites responsible for plant pigmentations, such as betalains, flavonoids, and carotenoids [18,19,20,21,22]. In general, flavonoids in plants are classified into six groups, chalcones, flavonoids, flavonols, isoflavonoids, anthocyanins, and flavanols [23]. Among them, flavonoids and flavonols are one of the sources of color in fruits and flowers, which are usually yellow or colorless. Quercetin and gossypetin are known flavonoids involved in synthesizing yellow pigments in plants [24,25,26]. Flavonols like kaempferol, quercetin, and myricetin have important medical properties such as free radical scavenging and antioxidant [27,28,29]. Anthocyanins are a type of flavonoid that can be found in a variety of plants as a natural water-soluble pigment. They are responsible for the color development of flowers, leaves, and fruits [30,31,32,33,34,35,36,37]. Phenylpropanoid biosynthesis, flavonoid metabolism, and anthocyanin metabolism are three major stages of flavonoid biosynthesis in plants. Flavonoid biosynthesis is aided by the structural genes PAL, C4H, 4CL, CHI, CHS, F3H, DFR, ANS, UFGT, and 3GT [38]. The F3H gene was first cloned from Artemisia annua, and it could convert pinealin to dihydrokaempferol by in vitro enzyme activity analysis [39]. In Cheng's study, three DFR genes (DFR1, DFR2, DFR3) were cloned from Ginkgo biloba and found that DFR1 could convert dihydroquercetin into colorless anthocyanins, while DFR2 could convert dihydrokaempferol into white leucopelargonidin [40]. Transcription factors are another type of regulatory gene involved in the phenylpropanoid-flavonoid synthesis pathway. MYB, bHLH, MADS-box, and WD40 are the main identified in the previous study that play a regulatory role in the flavonoid synthesis pathway [41, 42], with MYB, bHLH, and WD40 forming a ternary complex to regulate the expression of structural genes [43]. The bHLH transcription factor and the R2R3-MYB protein activate anthocyanin biosynthetic genes in Petunia and most other dicotyledons [44]. Gonzales et al. [45] found that Arabidopsis MYB75, MYB90, MYB113 and MYB114 regulate the expression of the Arabidopsis anthocyanin biosynthesis genes F3'H, DFR, ANS and UFGT.

Advancements in omics have enabled us to integrate multi-omics better to understand regulatory mechanisms behind particular traits [46]. Combining transcriptome and metabonomic methods, Zhang et al. [47] studied the peeling process of winter jujube. They found that a large amount of pigment is deposited in the cell wall, revealing the metabolic pathways and key genes that control the biosynthesis of lignin during the peeling process of winter jujube. Wang et al. [48] emphasized that the red fading of 'Red Bartlett' pears is closely related to the decrease in anthocyanin synthesis, increase in degradation, and inhibition of anthocyanin transport. Another study on Tunisian soft-seed pomegranate [49] identified 51 phenolic compounds, most contained in red–purple pomegranate arils significantly higher than those in light red pomegranate arils. Similarly, the combination of transcriptomic and metabolite approaches, researchers identified VuMYB90-1, VuMYB90-2, VuMYB90-3, VuCPC, VuMYB4, bHLH, and WD40 proteins affecting the accumulation of anthocyanins and flavonoids through regulation structural genes expression [50].

Hybrid cotton varieties produce higher yields than inbred varieties. However, inefficient and costly cross-breeding is a pressing problem in cotton production today. Developing cotton varieties with highly attractive flowers for pollinators is an effective approach to reducing costs and increasing pollination efficiency. This study aimed to decipher petal color variation in Asian cotton utilizing a metabolomics platform coupled with transcriptomics. Our results will provide a genetic basis for flower color variation in Asian cotton, contributing to developing new high-yield hybrid varieties and further enriching cotton germplasm resources.

Result

Phenotypes of petals of different Asian cotton varieties

To comprehend the genetic and metabolite regulatory networks of different flower color varieties, we chose three representative varieties with different petal colors for our study, Shixiya 1, GA0146, and GA0149. On the day of anthesis, Shixiya 1 has a white corolla and dark red petal basal spots and is named W_Flo. The wild-type GA0146 and its mutant material GA0149 were biologically similar except for the difference in corolla color (Fig. 1A), with GA0146 having a yellow corolla and GA0149 having a purple corolla, named Y_Flo and P_Flo, respectively. We measured the total anthocyanin content of petals for each variety, depicting significant differences in anthocyanin accumulation in petals (Fig. 1B). W_Flo had the lowest anthocyanin content (24.25 mmol/g), while Y_Flo (50.08 mmol/g) and P_Flo (192.53 mmol/g) had significantly higher anthocyanin content.

Fig. 1
figure 1

Comparison of phenotype and total anthocyanin content of white, yellow and purple Asiatic cotton (Gossypium arboreum L.) petals. A Color observation of petals from ‘Shixiya 1’ (W_Flo), ‘GA0146’ (Y_Flo), and ‘GA0149’ (P_Flo) Asiatic cotton. B Total anthocyanin content of three G. arboreum petals. Letters displayed above each rectangle indicate significant differences between samples (p ≤ 0.05)

Overview of the metabolomic data

We determined metabolic profiles using LC–MS/MS to comprehend the differential accumulation of flavonoids and their impact on flower color regulation. Metabolic quantification was subjected to principal component analysis (PCA)(Fig. 2A). The first two components covered 65.61% variation, with 43.52% for PC1 and 22.09% for PC2. We quantified 569 metabolites from 12 major metabolite classes, including flavonoids (122, 21.44%), lipids (78, 13.71%), amino acids and derivatives (76, 13.36%), phenolic acids (60, 10.54%), organic acids (51, 8.96%), nucleotides and derivatives (51, 8.96%), alkaloids (36, 6.33%), tannins (15, 2.64%), lignans and coumarins (9, 1.58%%), steroids (1, 0.18%), terpenoids (1, 0.18%), and others (69, 12.13%) (Fig. S1) ( Additional file 1: Figure S1). The accumulation pattern of identified metabolites in three samples has been presented as a heatmap (Fig. 2B), representing the differential metabolic landscape of different petal colors. Moreover, replicates of each sample were grouped together in both PCA and cluster analysis, emphasizing the quality and reproducibility of the subjected datasets.

Fig. 2
figure 2

Preliminary analysis of metabolomics data. Comparison of metabolite composition and content in petals. A Principal component analysis (PCA) of the metabolite quantification in petals of three colors (W_Flo, Y_Flo, P_Flo). B Cluster analysis of all metabolites from samples of W_Flo, Y_Flo and P_Flo. In the heat map, different colors represent the degree of accumulation of each metabolite, with red to blue indicating high to low. C Detailed classification of flavonoid metabolites

Among identified metabolites, we focused on the accumulation pattern of flavonoids. Therefore, we characterized 134 metabolites, including 122 flavonoids and 12 proanthocyanidins (Table S1) (Additional file 2: Table S1). The identified flavonoids could be classified into subclasses, including chalcones, sinensetin, dihydroflavone, anthocyanins, flavonols, flavonoids, isoflavones, and proanthocyanidins (Fig. 2C). Heatmap depicting differential accumulation has been presented as supplementary Fig. S2 (Additional file 3: Figure S2). A significant proportion includes flavonols (61). Moreover, five anthocyanins were identified, including cyanidin-3-O-(6''-Malonylglucoside), pelargonidin 3-O-beta-D-glucoside, cyanidin-3- galactoside chloride, cyanidin 3-glucoside, and cyanidin-O-syringic acid.

We summarized and mapped the phenylpropanoid-flavonoid biosynthetic pathway (Fig. 3). Fifty-four metabolites were identified in the pathway, with their content varying across samples. Interestingly, most kaempferols were relatively high in W_Flo; major quercetin and its derivatives substantially accumulated in Y_Flo; gossypetin species were abundant in Y_Flo and P_Flo; while pelargonidin significantly accumulated in purple petals. In addition to cyanidin-O-syringic acid, three-quarters of cyanidin were highly accumulated in purple petals. The content of catechin derivatives gradually decreased, such as gallocatechin-gallocatechin-catechin (from 175,570(white) to 71,507(purple)). Most of the remaining flavonoid metabolites were high in yellow and purple petals. Therefore, we speculated that kaempferols, quercetin, gossypetin, cyanidin and pelargonidin are key factors in regulating the petal color in Asian cotton. Moreover, pelargonidin and cyanidin may cause the purple corolla. Gossypetin and quercetin are significant yellow pigments involved in the deposition of yellow petal pigments, while kaempferol is an essential metabolite regulating the formation of white petals.

Fig. 3
figure 3

Heat map of phenylpropanoid-flavonoid synthesis pathway of petals of three colors: W_Flo, Y_Flo and P_FLo. The color of each rectangle represents the average amount of flavonoid metabolites in the two biological replicates. The blue rectangle means low and the red means high. PAL: phenyl ammonia-lyase; C4H: cinnamate 4-hydroxylase; 4CL: 4-coumarate-CoA ligase; CHS: chalcone synthase; CHI: chalcone isomerase; DFR: dihydroflavonol 4-reductase; FLS: flavonol synthase; F3H: flavonoid 3-hydroxylase; F3’H: flavonoid 3′-hydroxylase; F3′5’H: flavonoid 3′5′-hydroxylase; FNS, flavonoid synthase; ANS, anthocyanidin synthase; UFGT: UDP-glucose: flavonoid 3- O-glucosyltransferase

Metabolic differences among the three colors petals of G. arboreum

Comparisons of metabolic profiles for three samples identified differential accumulation of metabolites with 140, 177, and 76 DAMs in W_Flo vs. Y_Flo, W_Flo vs. P_Flo, and Y_Flo vs. P_Flo, respectively (Fig. 4A and Table S2) (Additional file 4: Table S2). As shown in Fig. 4A, 215 metabolites had differential accumulation in at least one of the compared combinations among the three petals, with up-regulated DAMs varying from 27 to 72 and down-regulated DAMs ranging from 49 to 109. In addition, we identified conserved DAMs between different samples and identified 18 DAMs differentially accumulated in all three samples. These conserved DAMs include amino acids and derivatives, phenolic acids, nucleotides and derivatives, flavonoids, organic acids, and other six categories, with flavonoids being the most abundant class (Fig. 4B). These identified DAMs were mapped to the KEGG pathways for further enrichment analysis. Annotation of DAMs revealed that they are related to anthocyanin biosynthesis, phenylpropanoid biosynthesis, flavone and flavonol biosynthesis, flavonoid biosynthesis, isoflavonoid biosynthesis (Fig. 4C). It means that the metabolites responsible for the different colors of Asian cotton are mainly flavonoids.

Fig. 4
figure 4

Differentially accumulated metabolites (DAMs) analysis of metabolome. A The number of differentially accumulated metabolites between three kinds of petals. B Venn diagram showing the overlapping and unique DAMs in comparison groups. C The top 25 terms of KEGG enrichment of DAMs. The vertical axis represents the enriched KEGG pathways. The horizontal axis represents the name of comparison groups. The significantly enriched pathways are labeled by red text

We also screened for differential accumulated flavonoids (DAFs) (Table S3) (Additional file 5: Table S3) and found 66, 76 and 21 DAFs were detected in the three comparison groups (W_Flo vs. Y_Flo, W_Flo vs. P_Flo, and Y_Flo vs. P_Flo), respectively. A total of 6 overlapping DAFs were identified in the three comparison groups, including two anthocyanins (pelargonidin 3-O-beta-D-glucoside, cyanidin-3-O-(6''-Malonylglucoside)). Therefore, we speculated that differential accumulation of these anthocyanins may cause a change in cotton flower color. Moreover, in the comparative combination W_Flo vs. Y_Flo and W_Flo vs. P_Flo, as opposed to Y_Flo vs. P_Flo, the phenylpropanoid biosynthesis, flavone and flavonol biosynthesis, flavonoid biosynthesis, isoflavonoid biosynthesis and biosynthesis of secondary metabolites were all more significant. The results suggested that differential accumulation of metabolites in the phenylpropanoid-flavonoid pathway may cause yellow to purple flower color variation. Compared to white and yellow flowers, the DAMs were mainly enriched in isoquinoline alkaloid biosynthesis, phenylalanine metabolism, glutathione metabolism, glucosinolate biosynthesis and cyanoamino acid metabolism pathways, which may result in higher capacity of purple petals than white and yellow flowers in response to external stresses such as pests, oxidation, drought and temperature [51,52,53].

Overview of the transcriptome data and identification of DEGs

The metabolome is interpreted as the end product of genetic pathways with genes as basic regulators. To identify the genes involved in flower color changes, we further sequenced the transcriptome of Asian cotton petals using RNA-Seq technology to explore the regulatory mechanism of the flavonoid compounds causing flower color changes. We took the petals of W_Flo, Y_Flo and P_Flo on the day of flowering and constructed nine cDNA libraries with three biological replicates in each group. The sequencing yielded a total of 205.55 million reads, and after filtering out the low-quality reads, we obtained a total of 61.39 Gb clean data. The Q30 of each cDNA library was above 92% (Table S4) (Additional file 6: Table S4). We identified 40,960 genes and quantified the expression of these genes in petal tissues. PCA was performed using their FPKM values (Fig. 5A). The replicates in each group clustered together, validating the credibility of transcriptome data sets for further downstream analysis. Moreover, the expression profile based on FPKM values has been presented in Fig. 5B. To verify the accuracy of the transcriptome data, we selected 6 genes from the flavonoid biosynthetic pathway for qRT-PCR expression analysis. The significant positive correlation between RNA-Seq and qRT-PCR data is shown in Fig. S3 (Additional file 7: Figure S3). The relative gene expression level of qRT-PCR was consistent with the FPKM value of the RNA-Seq data, indicating that the RNA-Seq data is credible and accurate.

Fig. 5
figure 5

Statistical analysis of transcriptome data from three cotton petal samples. A PCA score plot. B cluster analysis. The color indicates the relative levels of genes from low (blue) to high (red)

Using |Log2Fold Change|≥ 1, FDR < 0.05, differentially expressed genes of samples with different flower colors were statistically analyzed (Fig. 6). The results showed that there were 4,777 DEGs in the comparison group of W_FLo vs. Y_FLo, in which 2,233 genes were up-regulated, 2,544 genes were down-regulated; in W_FLo vs. P_FLo, there were 6,244 DEGs, including 2,152 genes up-regulated and 4,103 genes down-regulated; there were the fewest number of DEGs (3,249) in ‘Y_FLo vs. P_FLo’ group, with only 842 genes up-regulated and 2,407 genes down-regulated. (Fig. 6A). By comparing the number of DEGs, we hypothesized that more changes in gene expression are required during the change from white petals to yellow or purple than yellow to purple. This is consistent with the peach flesh changes studied by Hong Ying et al. [54]. Furthermore, we identified 816 DEGs as conserved between three groups (Fig. 6B), suggesting their involvement in flower color regulation.

Fig. 6
figure 6

Multivariate statistical analysis of transcriptome data in three Asiatic cotton petals samples. A The number of differentially expressed genes in each comparison group. B Venn diagram of DEGs in petals of different colors. C KEGG enrichment of DEGs in each comparison group. The y-axis indicates the KEGG pathway, and the x-axis indicates the name of each group. Red characters indicate flavonoid synthesis-related pathways and green words show other key metabolic pathways

All the 8,178 DEGs (detected in at least one comparative combination) were further annotated using KEGG metabolic pathways and presented in Fig. 6C. It is noteworthy that among the top 25 enriched pathways, we can observe that DEGs are significantly involved in metabolism, carbohydrate metabolism, pentose and glucuronate interconversions, biosynthesis of other secondary metabolites, photosynthesis and plant-pathogen interaction, but a few important genes were also enriched in phenylpropanoid biosynthesis, carotenoid biosynthesis flavonoid biosynthesis, flavone and flavonol (Fig. 6C). Meanwhile, the differential genes are mainly enriched in processes such as photosynthesis and energy metabolism in purple flowers compared to white and yellow flowers. The above results suggest that changes in differential gene expression result in significant changes in metabolic activity. This is similar to the study of the metabolome. To clarify the cellular composition, molecular functions, and biological processes involved in DEGs, we mapped all 8,178 DEGs to the GO database for enrichment analysis (Fig. S4) (Additional file 8: Figure S4). As shown in Figure S4, about 79.13% of the genes had catalytic activity (32.01%, 2,618), transferase activity (12.07%, 987), small molecule binding (11.98%, 980), nucleotide (11.53%, 943) and nucleoside phosphate binding (11.53%, 943). In the cellular component category, about 16.36% of genes constitute the cellular anatomical entity, 8.52% are involved in membrane composition, and more than 22.65% of genes are enriched in biological processes such as macromolecule modification, cellular protein modification process and protein modification process.

Flavonoid biosynthesis and differential expression of regulatory genes

Combining the KEGG enrichment results and gene function annotation, we screened important DEGs in the phenylpropanoid-flavonoid synthesis pathway, including the flavonoid biosynthesis (ko00941) pathway, phenylpropanoid biosynthesis (ko00940) pathway, isoflavonoid biosynthesis (ko00943) pathway and flavone and flavonol biosynthesis (ko00944) pathway, with the number of DEGs ranging from 2 to 66. There are 24 key DEGs involved in the flavonoid synthesis pathway potentially associated with color variation, including 4CL (Ga05G0055, Ga05G1511, Ga08G1805, Ga01G2437), CHI (Ga13G0234, Ga04G1997), CHS (Ga09G0006, Ga10G1445, Ga10G1446, Ga05G3486), DFR (Ga05G2037, Ga06G0096), F3'H (Ga11G2145), FL (Ga05G2477, Ga04G1879), LAR (Ga12G1133), PAL (Ga02G1655, Ga04G0847, Ga09G1700, Ga11G0075), ANR (Ga05G1789), ANT17 (Ga08G2083), and UFGT (Ga11G2554, Ga02G0536). The expression profile of these genes in different-colored flowers suggested significant variation (Fig. 7A). We calculated Pearson correlation coefficients (Table 1) to identify key genes regulating anthocyanin synthesis to explore the relationship between DEGs and total anthocyanin accumulation. The results showed that 13 DEGs promoted anthocyanin synthesis, and 10 DEGs suppressed anthocyanin synthesis. Among them, the expression levels of Ga05G2037 (DFR), Ga02G1655 (PAL), Ga05G3486 (CHS), Ga04G0847 (PAL), Ga06G0096 (DFR), Ga10G1446 (CHS), Ga02G0536 (UFGT), Ga13G0234 (CHI), Ga10G1445 (CHS) and Ga08G2083 (ANT17) were significantly and positively correlated with the total anthocyanin content (R2 ≥ 0.95), suggesting that these 10 genes play an important role in anthocyanin accumulation.

Fig. 7
figure 7

Heatmap showing the expression levels of the differentially expressed structural genes (A) and TFs (B) associated with color (FPKM value). Orange indicates high expression and blue indicates low expression

Table 1 Differential expression of structural genes in each comparison group

Transcription factors are a major player in regulating the expression of structural genes and, in turn regulating metabolite synthesis. In this study, we identified and classified 734 TFs, mainly from MYB, AP2/ERF, NF-Y A/B/C, WRKY, and bHLH families (Fig. S5) (Additional file 9: Figure S5). MYB, bZIP, WRKY, and bHLH families play crucial roles in the flavonoid and anthocyanin biosynthetic pathways [45]. These were also differentially expressed in different color petals, where the differences were mainly concentrated in W_Flo vs. P_Flo (144). In brief, both DEGs and TFs were more abundant in white flowers than in purple flowers, explaining why pelargonidin and cyanidin accumulation was highest in P_Flo. In addition, the correlation coefficients between TFs and total anthocyanin content indicated a total of 29 positively regulated TFs (R2 > 0.9) (Table 2), among which MYB was predominant and might act as a promoter of anthocyanin accumulation. Four negatively regulated (R2 < 0.9) TFs, including PIF, bHLH, MYB, and ATHB, may be repressors in anthocyanin synthesis. The results showed that the positively regulated transcription factors were expressed at the highest level in P_Flo, while the negative regulators were expressed at the highest level in W_Flo (Fig. 7B).

Table 2 Differential expression of known TFS in each comparison group

Regulatory networks between metabolites and genes associated with flower color

In addition to anthocyanins, quercetin and gossypetin were also considered equally vital substances resulting in yellow petals prior to the characterization of carotenoids [24, 25]. Moreover, quercetin and kaempferol were found to accumulate in high amounts in white Primula vulgaris [26]. Therefore, to understand the regulatory relationships between anthocyanins, quercetin, gossypetin, and kaempferol in pigmentation, we performed correlation analysis between key anthocyanins (2), gossypetin (5), quercetin (10), and kaempferol (5) (Table 3), with DEGs in the phenylpropanoid-flavonoid synthesis pathway to construct a network (Table S5) (Additional file 10: Table S5).

Table 3 List of the key DAFs regulating flower coloration in the 3 flower types and their Log2FC values

The regulatory network revealed genes associated with key metabolites and depicted compounds relevant to petal color (Fig. 8). Kaempferol was significantly associated with the white corolla, which is in agreement with the Li’s study. [26]. The colorless kaempferol showed high accumulation levels in W_Flo. Hmcp002029 (kaempferol-O-Pentoside-O-hexoside), pme0321 (kaempferol-7-O-rhamnoside), Lmmn003398 (kaempferol acetyl-glucoside) showed the most significant fold change among the three compared combinations. The expression pattern of two genes (PAL (Ga11G0075) and 4CL (Ga05G0055)) depicted a significant positive correlation with Hmcp002029 accumulation. Moreover, MYBs, FL (Ga05G2477) and F3H (Ga11G2145) jointly negatively regulate the synthesis of pme0321. The genes significantly associated with Lmmn003398 were mainly TFs (MYB and WRKY). The metabolite RFG0001-der20 (gossypetin-fer) was specific in W_Flo, with the value of Log2 Fold change up to 13.92. Transcriptome data showed that MYB (Ga07G1423, Ga12G1064, Ga11G2161, Ga02G1643, Ga03G2432, Ga12G1767), and WERKY (Ga04G1859) were highly correlated with Rfg0001-der20. Besides, Ga05G0055 (4CL1) and Ga04G1879 (FLS) were also positively correlated with RFG0001-der20 (R2 > 0.95). We speculate that these genes positively regulate the synthesis of gossypetin-fer. Previous studies have reported that TFs are essential in the biosynthesis of colorless kaempferol-associated flavonols and cooperate with structural genes to control flower color. Therefore, we hypothesize that MYBs and WRKY are central genes co-expressed or regulated with 4CL, FLS, and PAL, thus responsible for forming the white corolla. For Y_Flo (Fig. 8), the results showed a significantly higher accumulation of quercetin and gossypetin compared to white flowers, especially pme3130 (Log2 FC = 10.12), Lmmp003266 (Log2FC = 9.65), Lmmp003271 (Log2 FC = 7.02), and pma0214 (Log2 FC = 6.55). Interestingly, Ga05G0055 (4CL1) and Ga04G1879 (FLS) were negatively correlated with Lmmp003271, pme3130, and pma0214, with high expression of these two genes, suppressed gossypetin 7-glucoside, quercetin 4'-O-glucoside, and methylquercetin O-hexoside synthesis. Although Lmmp003266 accumulated only in trace amounts in Y_Flo, it was significantly higher in yellow flowers than in white and purple flowers. Some members of the bHLH, MYB, and WERKY family, such as MYB12, MYB30, MYB306, MYB1236, MYB44, MYB61, bHLH14, WRKY2, were expressed and had the lowest expression in Y_Flo but the highest content of quercetin or gossypetin, which are responsible for yellow petals, are negatively regulated by these TFs. The co-expression network (Fig. 8) showed that multiple metabolites regulated the purple petal species, and four kinds of metabolites were present in the purple flower species. The contents of pma3392 and pmb0542 gradually increased in the three groups, with the purple petals having the highest abundance, indicating they are essential for purple petals. Pma3392 accumulation not only relied on the positive regulation of CHS (Ga10G1445, Ga10G1446, Ga05G3486), DFR (Ga06G0096, Ga05G2037), FL (Ga05G2477), and MYB (Ga09G0340, Ga05G2658, Ga02G0929, Ga09G0647, Ga08G2734) but was also negatively regulated by bHLH (Ga08G1465). Pmb0542 was mainly regulated by MYB, where MYB4 (Ga03G2139) positively regulated the accumulation of pmb0542, while MYB16 (Ga04G1608) negatively regulated the synthesis of pmb0542.

Fig. 8
figure 8

Co-expression networks of differentially expressed genes and metabolites associated with pigmentation. Metabolites including kaempferol, quercetin, gossypiin, and anthocyanins, are shown in milky white, yellow, and pink, respectively. DEGs involve phenylpropanoid biosynthesis, flavonoid biosynthesis, isoflavonoid biosynthesis, flavone and flavonol biosynthesis and transcription factors. The size of the diamond and circle represents the genes FPKM and the metabolite content, respectively. The milk-white diamonds are kaempferol; the yellow diamonds mean quercetin or gossypiin, and the red diamonds are anthocyanins. The names of the metabolites are as follows: 1. Hmcp002029; 2. Lmmp002 l 30; 3. mws 1043-N; 4. Lmmn003398; 5. pme0321; 6. Lmmp002529; 7. Rfg0001-der20; 8. Rfg0376-der08; 9. pma0214; 10. Lmmp003266; 11. Rfg0001-der07; 12. pme3369; 13. pme3130; 14. pme2954; 15. Lmyp004052; 16. Mws1139-N; 17. Lmmpoo3271; 18. Rfg0380-der04; 19. mws0855; 20. mws0045; 21. pme3392; 22. pmb0542. The solid line indicates that genes and metabolites are positively regulated, while the dotted line indicates negative regulation

In general, the purple petals appeared to be formed by the differential accumulation of anthocyanins, quercetin and are influenced by several metabolites. The yellow petals seemed to be most influenced by gossypetin and quercetin. On the other hand, the white petals resulted from the weak accumulation of Rfg0001-der20 and the high kaempferol content.

Discussions

The flower color in Asian cotton ranges from white to yellow to purple. Brightly colored petals attract insects, spreading pollen and enriching Asian cotton's germplasm resources. Asian cotton can also be used as ornamental due to its eye-appealing flower characteristics. Moreover, flavonoids are involved in forming and developing flowers, fruits, and seeds in plants and other functions such as antioxidant activity, UV protection, and protection against biotic and abiotic stresses. The study of flavonoids in cotton petals could provide a new way for healthy anthocyanin extraction and thus increasing the value-addition of Asian cotton. To date, metabolome concerning flower petals in Asian cotton has not been reported. In this study, we systematically implied metabolomics and transcriptomics approaches to identify the major flavonoid metabolites affecting Asian cotton petals and screened key genes associated with petal color formation, providing important information for the enrichment of cotton germplasm resources.

The important role of anthocyanins in the color formation of Asian cotton petals

Anthocyanins are the primary factor influencing the coloration of different plant organs [34,35,36,37, 55, 56]. Several studies have demonstrated the essential role of anthocyanins in color formation. For instance, Xue et al. [34] depicted association of anthocyanins with red-colored seed-coat in peanut. A study by Qiu et al. concerning passion fruit [57] found that the total anthocyanin content of purple fruit was significantly higher than that of yellow fruit, raising the potential value of passion fruit as a functional food. Purple wheat [58] has special health benefits, with the color of purple wheat seeds deepening and the total anthocyanin content increasing over time during the developmental stages of purple wheat. White, yellow, blue and pink Primula vulgaris [26] showed a gradual increase in total anthocyanin content as the color deepened. Similarly, Asian cotton petals are rich in anthocyanins, so they can also be used as a natural source of pigments in food. Through phenotypically identifying and comparing the total anthocyanin content of different varieties of Asian cotton, we found that the total anthocyanin content of white, yellow, and purple petals was significantly different. Therefore, purple petals can be used as a natural source of anthocyanins to enhance the added value of Asian cotton.

The potential metabolites involved in flower coloration

The determination of secondary metabolite content and species has an essential function in the research exploitation of species. Flavonoids are vital secondary metabolites in plants [59, 60]. During plant growth and development, flavonoids fulfill a variety of physiological functions. They changed the color of plant organs [61], defended against biological attacks such as pathogenic bacteria [62], and inhibited the harm of abiotic factors such as temperature, drought, and salinity [55, 63,64,65]. Therefore, we performed a qualitative and quantitative analysis of secondary metabolites in Asian cotton petals. A total of 134 flavonoids were identified by metabolomic analysis. Forty-eight metabolite components (including 32 DAFs) of flavonoids, flavanols and anthocyanins were enriched in the well-known metabolic pathways. According to the analysis of W_Flo, Y_Flo, and P_Flo flavonoid compounds, we found a dramatic shift in the content of metabolites in the phenylpropanoid flavonoid biosynthetic pathway along with the deepening of petal color, with a gradual decrease in the content of catechin derivatives such as gallocatechin. Therefore, we speculated that catechins are necessary precursors for synthesizing anthocyanins, consistent with the results of Wang et al. [58] on purple wheat seeds. Moreover, other flavonoid metabolites showed multiple trends, indicating that Asian cotton petal color regulation is a complex and dynamic process regulated by multiple factors.

More than 20 kinds of anthocyanin have been identified so far, among which cornflower, geranium, paeoniflorin, petunia pigments, ariocyanin and mallow pigments are more common in plants [66]. In this study, two key anthocyanins were identified: cyanidin and pelargonidin. Cyanidin causes red–purple color variation, while pelargonidin contributes to orange and red [67]. Pelargonidin 3-O-beta-D-glucoside and cyanidin -3-O-(6''-Malonylglucoside) depicted differential accumulation in all three petals. Xue et al. [68] emphasized the regulatory role of cyanidin and pelargonidin in strawberry flower color, resulting in darkening of flower color. Quercetin and gossypetin were considered a major yellow pigment when carotenoids were not found in P. vulgaris [24, 25]. By analyzing the phenylpropanoid-flavonoid biosynthetic pathway, we identified 12 quercetin and 8 gossypetin that differed significantly in different color petals. Quercetin 4'-O-glucoside, methylquercetin O-hexoside, gossypetin 7-glucoside, and quercetin glu-malonyl-pen were specific in the yellow group, and we predicted them to be the main metabolites responsible for yellow petals. Although anthocyanin components were detected in W_Flo, they may be due to the effect of floral base spots. However, the white petals contained higher levels of kaempferol than the other two groups, especially kaempferol acetyl-glucoside and kaempferol 3-O-galactoside had the highest content, and kaempferol-O-pentoside-O-hexoside had the highest foldchange, which may be an important factor in regulating the pigmentation of white petals.

The key DEGs responsible for the phenylpropanoid – flavonoid biosynthesis pathway

Transcriptome analysis is important for identifying genes responsible for a stage-specific trait. The analysis of different comparison groups revealed that only a few genes showed an up-regulation trend in expression as petal color deepened, suggesting that the process of color deepening requires only a few key involvements. Moreover, the functional enrichment of DEGs indicated that changes in differential gene expression caused significant changes in metabolic activities. Some key secondary metabolites in plants have defensive functions and chemosensory effects against pests and diseases [69, 70]. However, we also found that anthocyanin biosynthesis is linked to the phenylpropanoid-flavonoid synthesis pathway, carbohydrate metabolism, pentose and glucuronate interconversions, and plant-pathogen interaction. Sugar, carbohydrate, and plant hormone play a crucial role in anthocyanin biosynthesis [58]. We also screened important DEGs in the phenylpropanoid flavonoid synthesis pathway by analyzing the transcriptome, including 4CL, CHI, CHS, DFR, F3'H, FL, LAR, PAL, and UFGT, and most of these structural genes were significantly expressed in purple petals [71]. We further selected 10 structural genes that were notably associated with the total anthocyanin content, suggesting critical roles of DFR, PAL, CHS, UFGT and CHI in accumulating anthocyanins [72]. We also identified MYB that may act as promoters of anthocyanin accumulation; Ga11G2616(PIF), Ga08G1465(bHLH), Ga08G1561(MYB) and Ga02G0399(ATHB) that may act as repressors of anthocyanin synthesis [73,74,75]. These results suggest that these 10 structural genes and 29 TFs are candidate genes responsible for regulating petal color traits.

Co-expression networks of key metabolites and genes

The combined transcriptome and metabolome analysis is an important tool for identifying the genes responsible for the different petal colors [48, 76, 77]. Correlation analysis between metabolome and transcriptome showed that the expression levels of some DEGs were closely related to the accumulation of flavonoid metabolites. Therefore, we constructed an interaction network containing 161 DEGs involved in the phenylpropanoid-flavonoid synthesis pathway and 29 flavonoids (7 kaempferol, 10 quercetin, 5 gossypetin, and 2 anthocyanins). Overall, purple petals were regulated by multiple metabolites, with kaempferol, quercetin gossypetin and anthocyanins all present in the purple petals. Pelargonidin 3-O-beta-D-glucoside and cyanidin-3-O-(6''-malonylglucoside) are highly accumulated in P_Flo and are essential for the formation of purple petals. Combined with the correlation analysis, CHS and DFR were considered as key structural genes in the flavonoid biosynthesis pathway in purple petals [78,79,80]. As we all know, CHS is a firmware rate-limiting enzyme of flavonoid biosynthesis that affects downstream secondary metabolites, while DFR is a key enzyme in the anthocyanin synthesis pathway that controls the accumulation of colorless anthocyanin-like metabolites in plants [19, 81]. These results suggest that the expression of flavonoid biosynthetic genes contributes to the accumulation of anthocyanins in purple petal forming. Our findings show that white petals are formed because of the weak accumulation of gossypetin-fer and high levels of kaempferol-7-O-glucoside, kaempferol 3-O-galactoside, and kaempferol acetyl-glucoside. These kaempferol compounds are synthesized in MYBs and bHLH co-expressed with 4CL and PAL resulted. It is consistent with the study of P. vulgaris [26]. Yellow petals are regulated by most gossypetin, mainly negatively regulated by 4CL, FLS, MYB, bHLH, and WERKY. Surprisingly, the co-expression network of Lmmp003271 (gossypetin 7-glucoside) indicated that Ga05G0055 (4CL1) had negative regulation of its accumulation, but this gene also positively regulated the synthesis of Rfg0001-der20 (gossypetin-fer). It implies that the low expression of Ga05G0055 increases the accumulation of gossypetin 7-glucoside and inhibits the synthesis of gossypetin-fer.

Conclusions

In the present study, we combined transcriptomic and metabolomic analyses to elucidate the mechanism of color differences between different Asian cotton varieties on a flowering day. A total of 122 flavonoid metabolites (including 5 anthocyanins) were identified in the three groups of samples. Moreover, we summarized the important metabolites and genes in the phenylpropanoid-flavonoid synthesis pathway, key factors intimately associated with petal color. We preliminarily screened for DEGs significantly associated with the phenylpropanoid-flavonoid synthesis pathway and, more importantly, identified transcription factors positively and negatively regulating anthocyanin content. In addition, 10 DEGs and 29 TFs played vital roles in anthocyanin accumulation. Ultimately, color-associated co-expression networks were constructed by joint analysis to screen metabolites and candidate genes associated with different petal colors. We have elucidated the metabolites and major genes that regulate Asian cotton petal color through comparative and integrative analyses, providing a basic framework and key basis for further studies on cotton flower color. However, further studies on these candidate metabolites and genes are needed to clarify the molecular mechanisms.

Materials and methods

Plant material

Three varieties of Asian cotton with different colored petals, Shixiya1 (white), GA0146 (yellow) and GA0149 (purple), were planted in three rows at the experimental base of the Institute of Cotton, Chinese Academy of Agricultural Sciences, Anyang, Henan Province (36°07′ N, 114°50′ E). Five completely open flowers from each row on the same day were taken for phenotype observation and their petals for total anthocyanin and RNA extraction. All samples were frozen in liquid nitrogen immediately after sampling and stored immediately in a -80 °C refrigerator.

Measurement of total anthocyanin content

The method of extracting total anthocyanins from Camellia sinensis petals by Fu et al. [82] was used and the extraction conditions were optimized. Asian cotton petals were ground into powder in liquid nitrogen. About 0.65 g of dry powder was added to 20 ml of 95% (0.1 mol L−1 HCL) ethanol solution and then heated in a water bath at 60 °C for two hours. Finally, the absorbance values of the extracts were measured by the BioTek microplate reader (Gene Company Limited, America) at 520 nm, 620 nm and 650 nm. The total anthocyanin content was calculated by the formula: A = (A530-A620)-0.1(A650-A620), anthocyanin content (mmol/g FW) = A × V × 1,000/489.72 m, where V represents the volume of the extract and m represents the weight of the dried petal powder. Ninety-five percent (0.1 mol L−1 HCL) ethanol solution was used as the blank control. Three replicates were analyzed for each sample.

Metabolite identification and quantification

A series of metabolite extraction, identification and quantification procedures were carried out at Wuhan Metware Biotechnology Co., Ltd (https://www.metware.cn) following the company's standard procedures [83, 84]. Cryo-preserved samples were weighed and extracted with 1.0 ml of 70% methanol at 4 °C. Extracts were analyzed using liquid chromatography-mass spectrometry/M.S. analysis (LC–MS/MS, UPLC, Shim-pack UFLC SHIMADZU CBM30A system; MS, Applied Biosystems 6500 QTRAP). All metabolites were identified and quantified by Metware's metabolite database and public metabolite database. Differential accumulation of metabolites (DAMs) between samples was identified using orthogonal partial least squares discriminant analysis. Metabolites with |Log2 Foldchange|≥ 1 and VIP (variable importance in project) ≥ 1 were defined as DAMs.

Transcriptome sequencing and analysis of Asian cotton flower

Total RNA was extracted from the samples (Shixiya 1, GA0146 and GA0149) with the RNA Extraction kit (TIANGEN, Beijing, China). The RNA quality and concentration were assessed with agarose gel electrophoresis and NanoDrop2000 spectrophotometer. RNA sample quality testing, library construction and sequencing for each sample were done at Biomarker Biotechnology (http://www.biomarker.com.cn). The cDNA libraries were sequenced on an Illumina NovaSeq 6,000 platform with generating paired-end reads. Low-quality data containing adapter and poly-N were removed for downstream analysis. The resulting set of high-quality clean reads was used for transcriptome analysis. The clean reads were localized to the Asian cotton reference genome (CRI-v1.0) (https://www.cottongen.org/data/download) [85] using Hisat2 to obtain unigenes. Fragments per kilobase of exon model per million mapped fragments (FPKM) for all genes to determine gene expression values. The differentially expressed genes (DEGs) were identified by R package DESeq2 for subsequent analysis. The genes featuring FDR < 0.05 and |Log2 Fold change|≥ 1 were considered DEGs. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases were applied further to annotate the identified DEGs from each pairwise combination using TBtools software [86]. Taking q-value < 0.05 as the criterion, twenty-five significant pathways were selected. Heat maps were generated using the OmicStudio tools at https://www.omicstudio.cn/tool.

Considering the important role of phenylpropanoid-flavonoid pathway structural genes and TFs in anthocyanin synthesis, the related structural genes and TFs in all samples were identified. Combined with KEGG enrichment results and transcription annotation files, all the annotated TFs were retrieved by searching the transcription annotation file. The related structural genes were filtered in ko00940, ko00941, ko00943 and ko00944 pathways. The Pearson correlation coefficients (R2) between key genes in the phenylpropanoid-flavonoid pathway and the total anthocyanin content of each sample were calculated using the R package Hmisc [87]. The TFs with |R2|> 0.9 and the structural genes with |R2|> 0.95 were retained for subsequent analysis.

Integrative analysis of key metabolites and genes

To analyze the interactions between genes and metabolites related to flower coloration, we constructed a co-expression network between key DAMs and important DEGs. First, we calculated the mean of key DAMs content and important DEGs expression of three biological replicates in each sample. Then, R package Hmisc was used to calculate the correlation coefficient between metabolites and genes. The DAMs and DEGs between three colored cotton were selected when |R2|> 0.8. Finally, co-expression networks were constructed using the correlation coefficients of genes and metabolites to reveal the interactions between petal color metabolites and genes. The co-expression networks were visualized using Cytoscape software (version 3.8.2).

The quantitative real time-PCR validation

Total RNA was reverse-transcribed in a 20 μL reaction mixture using the HiScript® II Q RT SuperMix for qPCR (+ gDNA wiper) kit (Vazyme, China). The PCR product was examined by 1.2% agarose gel (Tsingke, China) electrophoresis. If the band size and brightness met the requirements, the cDNA was of good quality and could be used for qRT-PCR experiments. The 20 μL reactions were performed with 10 μL of ChamQTM SYBR® qPCR Master Mix (High ROX Premixed), 1.0 μL 10 mM forward and reverse primers, 7.0 μL of ddH2O and 2.0 μL 5 times diluted cDNA template. The cotton Histone3 was used as an internal reference gene. The ABI Prism 7500 Fast system was used to perform qRT-PCR. The primers used for qRT-PCR are listed in Table S6 (Additional file 11: Table S6). The relative expression of genes was calculated by the 2−ΔΔCT method [88].