Introduction

Angiosperms are characterized by an enormous diversity of flower hues and shapes [1, 2]. Some plant species maintain their brilliant colors throughout the year, while others constantly transform as the seasons change. The substances responsible for these colors are pigments which include flavonoids, betalains, and carotenoids [3]. These pigment classes differ in their biochemical properties resulting in distinct color ranges. Flavonoids can be classified into multiple subgroups with anthocyanins forming the most colorful subgroup. Anthocyanins can provide orange, red, purple, blue, or almost black coloration [4]. Carotenoids lead to yellow, orange, or red coloration [5]. Betalains can be classified into yellow betaxanthins and red betacyanins [3]. Flavonoid biosynthesis and its regulation are among the best understood processes in plants. This comprehensive understanding makes flavonoids an invaluable system for exploring the mechanistic basis of phenotypic differences in plant coloration. We begin by reviewing the extensive body of knowledge regarding the biochemistry of flavonoid pigmentation before examining trends among the substitutions that commonly contribute to color variation.

Functions of flavonoids

Anthocyanins and other flavonoids, including flavones, flavonols, and proanthocyanidins, are a group of specialized plant metabolites responsible for numerous functions beyond coloration. Additional physiological functions include protection against herbivores [6, 7] and reduction of the impact caused by salinity [8], drought [9], and UV-radiation [10, 11]. Associated with their color are ecological functions such as the attraction of pollinators and seed dispersers which facilitates reproduction [6, 12]. Flavonols also contribute to the attraction of pollinators by forming guiding signals on flowers which are invisible to the human eye [12,13,14,15]. Anthocyanins are known to be responsible for the coloration of flowers, with hues ranging from red and orange to purple and blue. The diversity in colors depends on the chemical structure of the anthocyanin compound which includes the number of hydroxyl groups attached to the benzene ring, and the level of glycosylation [16, 17] and acylation [18]. Several reports suggest that the interaction with copigments like flavonols and flavones is an important factor for the stabilization of anthocyanins in plants [19, 20]. These co-pigments can also alter the hue of the anthocyanins participating in the complex formation [21, 22]. Environmental factors can also influence the color stability of anthocyanin pigments. For example, a plant exposed to an acidic soil can produce anthocyanins with an intense red or orange color [23]. Plants exposed to high temperatures can show degradation of anthocyanins while low temperatures can increase color intensity [24,25,26].

Flavonols, flavones, and proanthocyanidins have individual biological functions and can influence the coloration of different plant organs [27]. The characteristic cream white or pale yellow color, determined by the presence of flavones and flavonols, can be observed in leaves or petals of Taraxacum officinale “dandelion” [28], Chrysanthemum grandiflorum cv. Jinba [29], and Chrysanthemum morifolium [30]. Proanthocyanidins also called condensed tannins are colorless compounds that turn brown upon oxidation [31]. They have been studied in seeds of species such as Arabidopsis thaliana [32], Brassica napus [33], and Ipomoea purpurea [34].

Structural genes of the flavonoid biosynthesis

The general pathway of the flavonoid biosynthesis (Fig. 1A) is well understood and the central aglycon biosynthesis is largely conserved among land plants [3, 35]. It starts with the condensation of 4-coumaroyl-CoA and malonyl-CoA to synthesize naringenin chalcones which are later isomerized by the enzyme chalcone isomerase (CHI) to form naringenin, a colorless flavanone. In the next step, the pathway diverges: flavanone can either be hydroxylated by the flavanone 3-hydroxylase (F3H) to form dihydroflavonols or it can be oxidized through the activity of flavone synthase (FNS) and form flavones. After the hydroxylation of naringenin to dihydrokaempferol, the formation of dihydroquercetin and dihydromyricetin can take place through the catalysis of flavonoid 3’-hydroxylase (F3’H) and flavonoid 3’,5’-hydroxylase (F3’5’H), respectively. Subsequently, two enzymes can accept these intermediates and produce either flavonols through oxidation with flavonol synthase (FLS) or leucoanthocyanidins by reduction with dihydroflavonol 4-reductase (DFR). When the latter occurs, another enzyme called anthocyanidin synthase/leucoanthocyanidin dioxygenase (ANS/LDOX) catalyzes the synthesis of anthocyanidins. This step also requires an anthocyanin-related glutathione S-transferase (arGST) that was named AN9/TT19 due to the corresponding mutants [36, 37], but the enzymatic function was only revealed recently [38]. These anthocyanidins can be further modified through different steps including (Fig. 1B) (1) glycosylation in the presence of UDP-glucose flavonoid 3-O-glucosyl transferase (UFGT), (2) methylation through the activity of O-methyltransferase (OMT), and (3) acylation by the anthocyanin acyltransferase (ACT). Moreover, leucoanthocyanidins and anthocyanidins can also be reduced by the enzymatic activity of leucoanthocyanidin reductase (LAR) and anthocyanidin reductase (ANR), respectively, to synthesize catechins and epicatechins leading to the production of proanthocyanidins.

Fig. 1
figure 1

(A) Schematic representation of the general flavonoid biosynthesis pathway. (B) Simplified flowchart describing the biosynthesis pathway of anthocyanins. Enzyme names are abbreviated as follows: PAL - phenylalanine ammonia-lyase, C4H - cinnamic acid 4-hydroxylase, 4CL−4-coumarate-CoA ligase, CHS - chalcone synthase, CHI - chalcone isomerase, F3H - flavanone 3-hydroxylase, F3’H - flavonoid 3’-hydroxylase, F3’5’H - flavonoid 3’,5’-hydroxylase, DFR - dihydroflavonol 4-reductase, ANS/ LDOX - anthocyanidin synthase /leucoanthocyanidin dioxygenase, arGST – anthocyanin-related glutathione S-transferase, UFGT - UDP-glucose: flavonoid 3-O-glucosyltransferase, FLS - flavonol synthase, ANR - anthocyanidin reductase, FNS - flavone synthase. (C) Scheme of the different intracellular flavonoid transport mechanisms: (1) vesicle trafficking from the Endoplasmic Reticulum (ER), these vesicles may also incorporate Anthocyanin Vacuolar Inclusions (AVI) for efficient transport and storage within the vacuole, (2) membrane transporter throughout the multidrug and toxin extrusion transporter (MATE), and (3) transport might be mediated by the glutathione S-transferase (GST) and the ATP-binding cassette (ABCC) transporter. GST is presented twice due to the recently reported enzymatic function of arGSTs by Eichenberger et al [38]. It is currently not clear if GST functions only as an enzyme or if it also plays a role in anthocyanin transport

Intracellular transportation of flavonoids

Like many other specialized metabolites, flavonoids are synthesized in the cytoplasm. It is assumed that some enzymes responsible for catalyzing specific reactions in the flavonoid biosynthesis are attached to the endoplasmic reticulum (ER) and form a metabolon [39](Fig. 1C). Flavonoids produced at the ER are transported into the vacuole for storage [40,41,42] which results in observable pigmentation. The transport of these metabolites to the vacuole is a process that is not fully understood, but different models have been proposed that could explain observations from several experiments. Two widely accepted and well-known models are: (1) vesicle trafficking from the ER to the vacuole and (2) GST-mediated transport to the tonoplast, where membrane-based transporters are active [41]. Both models have in common a mechanism that is required to transport flavonoids across a membrane and these models are not mutually exclusive. It is feasible that these transport routes are active to certain degrees under different conditions, in developmental stages, or in different plant parts. The vesicular transport model proposes the accumulation of flavonoids inside the ER lumen and formation of small flavonoid rich compartments surrounded by a membrane that move to the central vacuole [39, 42]. The existence of these vesicles has been reported mainly in Zea mays [43], A. thaliana [44], Vitis vinifera [45], and Oryza sativa [46]. Microscopic evidence shows that these vesicular bodies are attached to the surface of the ER [45, 47] from where they are released into the cytoplasm and mobilized directly into the vacuole either by fusing with carrier proteins, or mobilized indirectly by following the trans-Golgi Network (TGN) transport pathway [48]. The GST-mediated transport model proposes that flavonoids are delivered to the tonoplast by ligandins [36, 49]. These ‘ligandins’ would be glutathione S-transferase (GST) binding and carrier proteins [50]. Evidence for the role of GST in flavonoid transport has been reported in multiple species, such as Zea mays [49], A. thaliana [36, 37], Petunia hybrida [50], Vitis vinifera [45], and Prunus persica [51]. Reconsideration of the GST-mediated transporter is needed in the light of a recent study [38] that revealed an enzymatic function in the anthocyanin biosynthesis for anthocyanin-related GSTs (arGSTs). The tonoplast-based transport mechanism involves different transmembrane channels which enable translocation of flavonoids into the vacuole [39]. These routes were reported to involve the Multidrug Resistance-associated Protein (MRP), belonging to the family of proteins ATP-binding cassette (ABC) actively transporting anthocyanins [52]. Using an electrochemical H + gradient to transport substances across membranes, Multidrug and Toxic compound Extrusion (MATE) is considered to regulate the vacuolar sequestration of proanthocyanidin precursors in the seed coat cells [52,53,54].

Regulation of the flavonoid biosynthesis by transcription factors

The transcriptional activity of genes encoding enzymes of the flavonoid biosynthesis pathway is controlled by numerous transcription factors or even protein complexes comprising of multiple transcription factors (Fig. 1A). Genes of the flavonol and flavone biosynthesis are largely regulated by MYB11, MYB12, and MYB111 [55,56,57]. The mechanism that regulates the expression of all structural genes in the anthocyanin biosynthesis pathway is commonly known as the MBW complex [58]. The name of the MBW complex is based on the three involved transcription factors: R2R3-MYBs, basic helix-loop-helix (bHLH) proteins, and WD40 proteins. One member of each of the three protein families is required for the complex formation. Different members of the MYB and bHLH family can participate resulting in combinatorial diversity [58]. After the discovery of TT8 in A. thaliana [59], Baudry et al. [60]. demonstrated the activity of the MBW complex in regulating the expression of the proanthocyanidins (PA) biosynthesis gene BANYULS (BAN). The ternary complex responsible for BAN regulation in A. thaliana is composed of TT2 (MYB123), TT8 (bHLH42), and TTG1 (WD40 family). Years later, it was demonstrated that the anthocyanin biosynthesis is also controlled by MBW complexes [61]. These complexes harbor PAP1, PAP2, MYB113, or MYB114 as the MYB component and GL3 or EGL3 as bHLH component as well as the WD40 protein TTG1 [61].

Activation of the anthocyanin biosynthesis by the MBW complex is evolutionary conserved across angiosperms [62], but the individual components involved can vary between species [63, 64]. The MBW complex controls other biological processes in plants like trichome formation, root hair development, and proanthocyanidin biosynthesis [65]. Multiple functions lead to different degrees of evolutionary constraints on the individual components of the MBW complex. A number of different MYB partners can participate in the MBW complex and are considered as the specificity determining factor of the MBW complex. For example, PAP1/MYB75, PAP2/MYB90, PAP3/MYB113, PAP4/MYB114 activate the anthocyanin biosynthesis in Arabidopsis [66], while TT2/MYB123 would activate the proanthocyanidin biosynthesis [60]. MYB5 was described as another anthocyanin activator in Fragaria [64] and proanthocyanidin regulators were reported to activate the anthocyanin biosynthesis in Vaccinium [67]. It is also known that different bHLHs can be involved in the MBW complex [68, 69] and that TTG1 can be replaced by LWD1 in strawberry [64]. The DNA binding and protein-protein interaction capacity of MYBs and bHLHs is determined by highly conserved regions [70]. It has been postulated that TTG1 serves as a scaffolding protein that maintains the interaction of MYB and bHLH [71]. This protein-protein interaction involves five WD repeats that account for over 60% of the protein length. Previous reports suggest that the MYB component is most often associated with changes in flower pigmentation indicating low constraints on this component of the MBW complex due to higher functional specialization [72,73,74].

Evolutionary patterns of anthocyanin pigmentation

Closely related plant species can differ in their anthocyanin repertoire and pigmentation pattern [75, 76]. Such pigmentation differences have frequently been reported between plants of the same species [1, 77,78,79]. To the best of our knowledge, there are no reports about anthocyanin loss in any major taxonomic lineages except for the replacement of anthocyanins by betalains in some families of the Caryophyllales [80,81,82]. This suggests that anthocyanin pigmentation differences are often intraspecific. All structural genes in the anthocyanin biosynthesis pathway must be functional and active to achieve anthocyanin pigmentation. Mutations in any regulator or enzyme encoding gene of the flavonoid biosynthesis can affect the coloration. Pigmentation differences have been studied in many plant species including A. thaliana [83], Vitis vinifera [84], Malus domestica [85], Solanum lycopersicum [86], Hordeum vulgare [87], Nicotiana tabacum [88], and Nicotiana alata [89]. Substrate competition between different branches of the flavonoid biosynthesis can also have an impact on the anthocyanin accumulation [90]. For example, an increased flavonol production can lead to a pigmentation loss due to insufficient substrate for the anthocyanin biosynthesis [91]. These natural differences in pigmentation provide an excellent system to study evolutionary processes that lead to the inactivation of a pathway. In theory, a biosynthesis pathway could be interrupted at any of the successive steps [92], but previous research suggests that some genes are more often responsible for pigmentation loss than others [74, 93].

Our main questions revolve around identifying potential “hotspots” in the anthocyanin biosynthesis pathway that frequently lead to pigmentation loss. This loss is not attributed to a higher mutation rate but is rather linked to an increased likelihood of mutation fixation. The relative importance of cis-regulatory and trans-regulatory changes during evolution is of huge interest beyond the flavonoid biosynthesis of plants [94, 95]. MYBs are known to be key regulators in the flavonoid pathway [58] and have been frequently implicated in phenotypic variation in flower color [96,97,98]. Previous studies suggested that MYB transcription factors are often responsible for evolutionary changes [58, 99, 100]. Our findings will generally help to understand whether biosynthesis pathways in plants are naturally shut off at the first committed step. This would be the logical position when performing metabolic engineering, to avoid substrate being channeled into a dead-end pathway and potentially causing the accumulation of toxic intermediates [101,102,103,104]. Given that DFR is often presented as the first committed step of the anthocyanin biosynthesis, it represents a critical point where interruptions can effectively avoid metabolic flux into a dead-end pathway. Specifically, our study addresses three questions: (1) Is the loss of anthocyanin pigmentation within species primarily caused by mutations in transcription factors or structural genes? (2) Is DFR at the start of the anthocyanin biosynthesis branch, more predisposed to be causative for variations in anthocyanin pigmentation loss compared to downstream genes such as ANS, arGST, or UFGT? (3) Do MYBs play a more frequent role in anthocyanin pigmentation loss compared to members of other transcription factor families? To address our research questions, we conducted a systematic intraspecific comparison in numerous flowering plant species, contrasting anthocyanin-pigmented varieties with their non-pigmented counterparts. An extensive literature screening was performed to identify reports of causal genes explaining pigmentation differences between varieties of a plant species or, in rare cases, between closely related (sub)species. In instances where studies lacked conclusive results, we scrutinized data availability and conducted reanalysis when feasible. The quantification of different genes associated with the pigmentation loss supports the crucial role of transcription factors, particularly MYBs.

Results

Taxonomic distribution of analyzed cases across plant families

To evaluate how well the analyzed data sets are distributed across different plant lineages, a taxonomic distribution analysis was performed. The 235 analyzed cases (see Additional file 1: Table S1), cover 53 plant families distributed over 31 orders (Fig. 2). Notably, the order Rosales, particularly the Rosaceae family, accounted for the highest number of studies (39) showcasing variations in pigmentation. Second were the Brassicaceae family with 24 reported cases, and the Orchidaceae, Fabaceae, and Solanaceae families each contributing 13 cases. Furthermore, the Ericaceae family appearing in 9 studies, and the Asteraceae and Liliaceae families, each appearing in 8 studies, were also noteworthy. While the Theaceae, Poaceae, and Paeoniaceae families were featured in 7 cases each. Other families, including Lamiaceae, Asparagaceae, Malvaceae and Caryophyllaceae, were identified in varying frequencies across the reviewed studies. This rich diversity in distribution of families and orders is crucial to reveal universal mechanisms explaining pigmentation differences within plant species. Additionally, it highlights the ecological and evolutionary significance of this morphological phenomenon across angiosperms. However, it is noteworthy to clarify that among the different studies the terms “varieties”, “lines”, and “cultivars” were often used interchangeably. In the literature, it was not consistently clarified whether these distinctions arose from horticultural/artificial interventions, or if those differences could be attributed to natural causes.

Fig. 2
figure 2

Tree topology is based on Li et al [95]

Phylogenetic tree displaying 428 angiosperm families. Each color range groups the families of an order. Families highlighted in bold red are those encompassed in the literature screening for anthocyanin pigmentation differences. The number of pigmentation difference cases is given in parentheses for each of these families

Genetic hotspots responsible for anthocyanin pigmentation differences

To find out whether mutations in specific anthocyanin biosynthesis genes are predominantly responsible for the loss of anthocyanin pigmentation, reports about pigmentation loss were screened. Based on a total of 235 analyzed cases (see Additional file1: Table S1), we determined the genes most probable to be responsible for color variation between accessions in each of these species. Four of the included studies did not report one causal gene, but provided the necessary data for a reanalysis (Additional file 2). In the schematic representation (Fig. 3), we defined pigmentation to be the wild type state, while absence of anthocyanin was defined to be the result of a mutation. We identified 13 cases in which upregulated structural genes in pathways competing for substrate with the anthocyanin biosynthesis as the cause of color difference between unpigmented and anthocyanin-pigmented accessions have been reported. Additionally, 58 events of non-activated, down-regulated, non-functional, or lost structural anthocyanin biosynthesis genes were reported in the literature. Moreover, in 147 different cases a transcription factor was proposed to be responsible for differences in pigmentation. Many of these reports named a specific transcription factor. In total, 82 MYBs (activators and repressors), 10 bHLHs, two TTG1 homologs, one bZIP, one WRKY were presented as the causal gene for pigmentation differences. The remaining 49 cases are probably due to the action of multiple transcription factors or caused by TFs that activate the components of the MBW complex. Ten reports presented genes that encode proposed intracellular transporters of anthocyanins as best candidates such as MATE and possibly GST. It was not possible to determine the causal gene in 17 of the analyzed studies.

Fig. 3
figure 3

Schematic representation of the flavonoid biosynthesis pathway with emphasis on the number of cases in which a particular gene was responsible for color difference according to a systematic literature analysis and re-analyses of RNA-Seq data sets. The anthocyanin-pigmented accession is set as reference when determining up- and down-regulation. Blue boxes and red boxes indicate the number of up-regulated and down-regulated/non-functional/lost genes, respectively. Down-regulated genes are placed in one group with genes that lost their function due to mutations in the coding sequence or completely lost genes, because the ultimate function of the gene is lost in any of these cases. Flavonoids were divided into four groups that are indicated by color shading: flavones, flavonols, anthocyanins, and proanthocyanidins. Black bold letters represent the different genes encoding the enzymes and transporters involved in the pathway. CHS - chalcone synthase, CHI - chalcone isomerase, F3H - flavanone 3-hydroxylase, F3’H - flavonoid 3’-hydroxylase, F3’5’H - flavonoid 3’,5’-hydroxylase, DFR - dihydroflavonol 4-reductase, ANS - anthocyanidin synthase, UFGT - UDP-glucose: flavonoid 3-O-glucosyltransferase, FLS - flavonol synthase, LAR – leucoanthocyanidin reductase, ANR - anthocyanidin reductase, FNS - flavone synthase, arGST – anthocyanin-related glutathione S-transferase, MATE – multidrug and toxin extrusion, anthocyanin MYB – anthocyanin MYB activator, bHLH – basic Helix-Loop-Helix, bZIP – basic leucine zipper, TTG1- TRANSPARENT TESTA GLABRA1, WRKY- WRKY DNA-binding domain, uTF – unclassified transcription factor, NA – undetermined genetic factor. GST is presented twice due to the recently reported enzymatic function of arGSTs by Eichenberger et al. [38]. It is currently not clear if GST functions only as an enzyme or if it also plays a role in anthocyanin transport

Genetic factors of anthocyanin loss across plant families

To evaluate whether different genes are predominantly responsible for the loss of anthocyanin pigmentation in different plant lineages, the analysis described for all data sets above was also performed for individual lineages. The genes that have been identified as influential factors in driving variation were graphically represented along with the families in which they have been reported (Fig. 4). This visualization aims to uncover potential associations between specific genes and their prevalence across different plant lineages.

A noteworthy observation is the prominence of anthocyanin biosynthesis activating MYB transcription factor genes (classified as AnthoMYBact), which have been reported in 75 cases across all plant families. Families with a particularly high prevalence of AnthoMYBact are Rosaceae, Brassicaceae, Orchidaceae, Liliaceae, Solanaceae, and Asteraceae. On the contrary, some genes such as anthocyanin biosynthesis repressing MYBs (AnthoMYBrep), transcription factor bZIP, enzyme CHI, and FlavonolMYB appear in fewer instances, indicating a rare involvement in color variation. By juxtaposing genes associated with anthocyanin loss with the respective families where they have been observed, we aim to discern any family-specific patterns.

Fig. 4
figure 4

Number of cases each gene is implicated in anthocyanin differences reported in the literature (Additional file 1: Table S1) resolved by family. ANR – anthocyanidin reductase, ANS – anthocyanidin synthase, AnthoMYBact – anthocyanin MYB activator, AnthoMYBrep – anthocyanin MYB repressor, bHLH – basic Helix-Loop-Helix, bZIP – basic leucine zipper, CHS - chalcone synthase, CHI – chalcone isomerase, DFR – dihydroflavonol 4-reductase, F3’5’H – flavonoid 3’,5’-hydroxylase, F3’H – flavonoid 3’-hydroxylase, F3H – flavanone 3-hydroxylase, FLS – flavonol synthase, FNS – flavone synthase, GST (arGST) – anthocyanin-related glutathione S-transferase, uTF – unclassified Transcription Factor, TT12 – TRANSPARENT TESTA12, TTG1– TRANSPARENT TESTA GLABRA1, UFGT – UDP-glucose: flavonoid 3-O-glucosyltransferase, WRKY– WRKY DNA-binding domain, N.A – undetermined genetic factor

Overrepresentation of genetic factors causing anthocyanin loss

The hypothesis that DFR might be more often responsible for an anthocyanin loss than ANS or other structural genes in the anthocyanin biosynthesis was tested. Among the 235 cases analyzed (see Additional file 1: Table S1), DFR exhibited the highest frequency, being reported in 21 cases, while ANS was identified in three cases as the gene reported as causal for the color variation. The number of cases reporting DFR as the primary factor for pigmentation differences is significantly higher than the number of cases reporting ANS (χ² test, adjusted p-value = 0.0014).

To further examine whether this dominance of DFR extends to subsequent anthocyanin genes, we compared DFR with arGST which also revealed a significant difference (χ² test, adjusted p-value = 0.0228). Similarly, when examining the relationship between DFR and UFGT, a notable difference was again detected (χ² test, adjusted p-value = 0.0078). This suggests that DFR is the most important target of evolutionary events blocking the anthocyanin pathway through mutations in structural genes.

To investigate if the color variation could be attributed to the substrate competition between the two enzymes FLS and DFR, we examined the prevalence of cases revealing FLS and DFR as the primary genes influencing color variation in plant tissue. While the mechanisms underlying FLS up-regulation and DFR down-regulation are different, the metabolic consequences in terms of relative FLS to DFR enzymatic activity are similar. A hyperactivation of FLS was identified in 10 cases as the factor responsible for anthocyanin pigmentation loss which is significantly lower than the number of 21 cases in which a DFR down-regulation/loss was responsible (χ² test, adjusted p-value = 0.048). This suggests a more pronounced influence of DFR in the observed color variations, implying a potentially pivotal role in the genetic mechanisms governing anthocyanin production. It further suggests that a down-regulation or silencing of DFR is more strongly associated with the white coloration of plant tissues compared to an increased activity in expression of an FLS gene and subsequent production of flavonols.

To understand the relative importance of different transcription factors in anthocyanin loss events, the numbers of observed cases were compared as described above for the structural genes. Similar to the comparison between the structural genes, the differences between the TFs reported to be the causal factor of anthocyanin pigmentation differences were analyzed. The chi-square analysis revealed that the frequency of anthocyanin biosynthesis activating MYBs appearing as causal gene of color variation is significantly higher than the presence of other transcription factors such as bHLH, WRKY, TTG1, bZIP, and others (χ² test, adjusted p-value = 9.66e−10). Even when compared against the large group of unclassified TFs, we observed that the presence of MYBs is significantly higher (χ² test, adjusted p-value = 0.0228). In summary, these findings collectively highlight the prevalent role played by the MYB transcription factor family in influencing the observed pigmentation variations.

Discussion

Anthocyanins are one of the main factors responsible for color variation in plant tissues, particularly in flowers. The variation in floral coloration can occur as a result of plant adaptation to different biotic and abiotic conditions, but interactions with pollinators might be the most important function of anthocyanins in flowers [105, 106]. In some cases, flower coloration changes due to visitation by insects [107]. Flavonoids are also known to protect against UV radiation [108], drought [109], and cold stress [26, 110]. Previous studies have reported numerous genes responsible for anthocyanin pigmentation differences within a plant species (Additional file 1: Table S1). We present an aggregated analysis of the most likely candidate genes responsible for color variation. This analysis harnessed public RNA-Seq data sets that enabled a direct comparison of anthocyanin biosynthesis gene activity between differently pigmented accessions and the taxonomic family they belonged to. This comparison evaluates whether specific genes are responsible for pigmentation loss in certain lineages. The complex interplay between specific genes and their role in shaping plant pigmentation has been a subject of investigation in numerous studies [90, 100, 111,112,113,114] and has been experimentally tested in families such as Solanaceae [113], as well as in specific genera like Ipomoea [73, 75], Iris [115], Antirrhinum [116], and Petunia [93].

DFR is the block ‘hotspot’ in the anthocyanin branch

DFR activates the conversion of dihydroflavonols to leucoanthocyanidins, which is often considered as the first committed step of the anthocyanin biosynthesis. Through a comprehensive literature survey, we revealed that DFR is more often harboring a disruptive variation that results in a block of the anthocyanin accumulation in colorless varieties than any downstream gene in the anthocyanin biosynthesis. A transition from red/purple to white/cream flower color would require some kind of blockage in the anthocyanin production, which probably occurs upstream of anthocyanidin formation [117]. Leucoanthocyanidins can be catalyzed to form two different products, catechins via LAR and anthocyanidins via ANS. If the anthocyanin biosynthesis is blocked at the ANS step, the product to be formed would be catechins. This redirection in the metabolic flow would form proanthocyanidins instead of anthocyanins and could ultimately result in brownish pigmentation due to polymerized and oxidized proanthocyanidins [118].

A study conducted by Rausher et al. [119] proposed that the evolutionary rate of enzymes depends on their location in a pathway with early genes showing a slower evolution rate, but this has been contradicted [120,121,122]. According to the Arabidopsis Information Resource (TAIR), the length of the coding sequence (CDS) of DFR is 1149 bp (accession: NM_123645) and of ANS is 1071 bp (accession: NM_118417). To the best of our knowledge and based on the low length difference between the DFR and ANS coding sequences, there is no evidence that the occurrence of a mutation in DFR is substantially more likely than a mutation in ANS. While the mutation rate in both genes is probably equal, the rate of mutation fixation might be very different. According to theories of metabolic regulation, it is evolutionary beneficial to have blocks at the first committed step of a branch in a biosynthesis pathway in order to avoid a waste of energy and resources by pushing substrate into a blocked pathway [123, 124]. This could explain why DFR and not ANS, arGST, or UFGT appears frequently in analyzed cases of intraspecific anthocyanin pigmentation differences.

Cross talk and substrate competition: anthocyanins vs. flavonols

Plants have multiple mechanisms to regulate their metabolism in response to environmental conditions and availability of resources. Substrate competition is among the factors determining the color variation observed in plant tissues [19]. Metabolically, this can occur when two enzymes or transporters compete for the same or very similar substrates [123, 125]. DFR and FLS are both catalyzing reactions that utilize dihydroflavonols, but lead to different products. While DFR generates colorful anthocyanins, FLS produces colorless flavonols. There are three different types of dihydroflavonols namely dihydrokaempferol, dihydroquercetin, and dihydromyricetin that differ in their hydroxylation pattern. Different isoforms of DFR and FLS have preferences for specific hydroxylation patterns which could be a mechanism to avoid direct substrate competition [125]. The relative activities of F3H, F3’H, and F3’5’H determine the intracellular levels of the three dihydroflavonols. Our analyses revealed that variation associated with DFR is more often responsible for a color change than variation associated with FLS. In total, 21 cases revealed that the low expression of DFR is responsible for the color contrast between unpigmented tissues and those that show anthocyanin pigmentation. Only ten cases showed an increased FLS activity as the cause of pigmentation loss. A high expression of FLS leads to an accumulation of colorless flavonols instead of colorful anthocyanins as reported previously in several plant species [126]. A recent study identified a flavonol biosynthesis regulating MYB as the most frequently affected gene in pigmentation pattern change [122]. Loss of expression or loss of a gene function can be the consequence of many different mutations and thus be more likely to happen than a gain-of-function mutation. It is also feasible that some researchers only investigated the classical anthocyanin biosynthesis genes when looking for a molecular mechanism to explain the anthocyanin pigmentation difference thus leading to an observation bias concerning the responsible genes. However, this is unlikely to explain the strong difference between hotspots like DFR and MYB and other genes. Once the anthocyanin biosynthesis is disrupted, selection against additional mutations in the anthocyanin biosynthesis genes might be weak or completely absent. This could result in the accumulation of secondary mutations. A number of additional mutations in the anthocyanin biosynthesis would increase the chances that at least one of them is picked up by researchers looking at anthocyanin biosynthesis genes. Performing future analyses by inspecting a more comprehensive gene set could make the identification of causal genes in color difference studies more accurate.

Transcription factor variations appear frequently as block to anthocyanin accumulation

It is well known that the transcriptional activity of structural anthocyanin biosynthesis genes is regulated by a ternary complex consisting of a MYB, a bHLH and a WD40 protein (MBW complex). The anthocyanin biosynthesis is even considered a model system for transcriptional control in eukaryotes. Previous studies identified transcriptional activation of different R2R3-MYBs [127,128,129,130] and bHLHs [131,132,133] as the cause for the increase in anthocyanin levels. Our results showed that transcription factors were three times more often reported as causal factors of color differences than structural genes. While we cannot rule out the possibility that structural genes can also accumulate variation prior to the reduction in transcription, this observation is in line with a previous study that observed faster evolutionary changes in transcription factors than in structural genes [124]. Similarly, Wheeler et al. [122]. showed that transcription factors, particularly MYBs, presented lower levels of gene expression with higher molecular evolutionary rate compared to their targeted structural genes, and suggested a negative correlation between evolution rate and gene expression in the Petuniae tribe [122]. This premise commonly known as the E-R anticorrelation, has been widely studied across different organisms, including yeast [134, 135], Arabidopsis [136], Brassica [137], Barley [138], Arachis [139], and Drosophila [140]. However, the hypotheses explaining this model are still a topic of debate.

The loss of a transcription factor can switch off an entire biosynthesis pathway, while genes involved in this pathway could still be activated by other transcription factors to harness their activity in a different metabolic context. For example, DFR and ANS, two important anthocyanin biosynthesis genes, are also required for the biosynthesis of proanthocyanidins. A selective loss of anthocyanins and maintenance of the proanthocyanidin biosynthesis can not be caused by the loss of DFR or ANS. A well known example for such a scenario is the conservation of DFR and ANS across betalain-pigmented lineages of the Caryophyllales [141], which do not accumulate anthocyanins [80, 126].

Studies on evolutionary rates investigating the components of the MBW complex suggested that MYBs would be the most probable component to be lost due to the highest degree of specialization which coincides with a lower pleiotropy [74, 122, 142]. It was observed that insertions/deletions were the most frequent mutation events in MYBs, while amino acid substitutions in the conserved region appeared irrelevant [73]. This led to the hypothesis that amino acid substitutions in transcription factors might not be relevant for the pigment evolution context, while InDels could disrupt the function of the encoded protein [73]. However, a recent investigation suggests a high importance of amino acid substitutions in the R3 interaction domain of MYBs in the loss of anthocyanin pigmentation in betalain-pigmented Caryophyllales [141]. These amino acid substitutions alter a highly conserved region that is considered crucial for the interaction of MYB and bHLH protein in the MBW complex which is required for activation of anthocyanin biosynthesis genes [143,144,145]. The lack of a functional MBW complex is considered as one crucial factor for the loss of anthocyanin pigmentation in the betalain-pigmented Caryophyllales [141]. In summary, there is evidence for InDels and amino acid substitutions as mechanisms that can disrupt the function of MYBs involved in the pigmentation biosynthesis regulation.

The frequency of mutations in TFs has been studied in various species. For example, a study of the A. thaliana genome revealed that there are more than 2,000 genes that encode for TFs and that these genes were more prone to accumulate mutations than non-TF genes [146]. Another study examined the frequency of point mutations in TF genes in Escherichia coli and found that these genes were more vulnerable to harmful mutations that resulted in significant changes in gene expression than non-TF genes [147]. Given that their activity covers a wide range of functions, transcription factors can regulate and affect the expression of structural genes without the need for high gene expression levels [122]. This multifunctionality becomes apparent when examining specific TF families, such as MYBs, which are crucial in the regulation of the anthocyanin biosynthesis. In this context, a recent study by Liang et al. discovered a single point mutation in the 5’-UTR of PELAN, an anthocyanin-activating R2R3-MYB, as causal mutation for the loss of pigmentation in Mimulus parishii [148]. The results of their experiments concluded that despite the similar transcript abundance of PELAN in both strong pigmented and low pigmented cultivars, the difference in phenotype was due to a mistranslation of the PELAN mRNA [148].

The number of TFs required to regulate a specific enzyme-encoding gene is a complex and dynamic process that is influenced by various factors [149], including the complexity of the regulatory region [150], the stage of development or cell type [151], and environmental factors [152]. For example, only about ten R2R3-MYBs are known to bind specific DNA motifs related to the regulation of the flavonoid biosynthesis pathway in A. thaliana [57, 153]. In a review study, Feller et al [70]. performed a comparative analysis of the transcription factor families MYB and bHLH. They concluded that the reason why the bHLH family contains one of the largest numbers of transcription factors in plants is due to their functional diversification [70]. Multiple bHLH proteins contain a similar ligand-binding domain targeting different enzyme encoding genes [154]. A greater proportion of MYBs were recognized to be responsible for the regulation of flavonoid biosynthesis genes in comparison to bHLHs [154]. Our results align with this observation, because MYBs were reported as the causal gene of color differences in 75 cases, while bHLHs were only reported as a crucial factor in ten cases. If bHLHs are more often involved in multiple processes, their loss would be more detrimental which makes it less likely to occur. Similar explanation can be inferred from contrasting the reported cases involving MYBs with those involving other TF, such as, TTG1, WRKY, and bZIP. Given that most classified transcription factors are MYBs, it is expected that most of the cases with unidentified transcription factors would actually be due to anthocyanin biosynthesis activating MYBs.

Conclusion

Our systematic literature screening supported the assumption that variations in transcription factors are the most frequently observed blocks in the anthocyanin accumulation in cases of intraspecies pigmentation differences. Specifically, the MYB components of the MBW complex exhibited dominance in influencing anthocyanin accumulation variations among differently pigmented accessions. The degree of transcription factor specialization for a certain pathway seems to determine the frequency of their implication in color differences with more pleiotropic TFs like bHLH and TTG1 having a lower relevance. According to our results, MYBs are most often responsible for the difference in anthocyanin content, followed by bHLH, and other TFs. When structural genes appeared to be responsible for the absence of anthocyanins, this was most often a lack of DFR activity. An increased activation of the flavonol biosynthesis as a pathway competing for substrate with the anthocyanin biosynthesis was seen in rare cases. Therefore, our findings highlight the pivotal role of transcription factors, particularly MYBs, in determining anthocyanin content differences within species.

Methods

Extensive literature screening

An extensive record identification was performed in electronic databases (PubMed, Google Scholar, JSTOR) using specific screening keywords: “flower pigmentation differences”, “leaf pigmentation”, “color difference”, and “anthocyanin loss”. A total of 230 full-text articles were included for an eligibility assessment between December 2021 to October 2023. Accessible articles were considered if the genetic basis of pigmentation was investigated in the respective study (Additional file 1: Table S1). The evidence for causal genes reported in the literature differs between studies. We classified the presented evidence into the following main categories: ‘knockout mutant’, ‘in vitro characterization’, ‘coexpression patterns’, ‘metabolic exploration’, among others (Additional file 1: Table S1). Additionally, the respective family and order of each investigated species were collected along with the names of accessions, varieties, lines, or cultivars involved in the study. Most studies compared accessions of the same species that differed in anthocyanin pigmentation thus we are predominantly exploring intraspecific mechanisms of anthocyanin loss. While studies were classified as intraspecific or interspecific, we refrained from separate analyses due to a low sample size. As the classification of plants into categories like accessions, subspecies, and closely related species might leave some room for discussion, we considered all these studies.

Data sources

From the 230 different articles, four studies were selected for in-depth transcriptome re-analysis. The four studies generated RNA-Seq data sets for the analysis of genetic factors underlying differences in pigmentation. The analyzed species were Michelia maudiae [155], Rhododendron obtusum [156], Trifolium repens [157], and Hosta plantaginea [158] (Additional file 2). The selection of each dataset was based on the following criteria: (I) paired-end RNA-Seq data, (II) study has biological replicates, and (III) the authors have not identified the specific gene responsible for the color difference. The RNA-Seq datasets of those four plant species were retrieved from the Sequence Read Archive (www.ncbi.nlm.nih.gov/sra) (Additional file 1: Table S2) using fastq-dump [159].

De novo transcriptome assembly

Transcriptomic data sets of four plant species were re-analysed to identify a candidate gene that could explain the absence of anthocyanins in one accession of each of these species. The generation of de novo transcriptome assemblies was necessary, because no transcriptome or genome sequences of these species were publicly available. Trimmomatic v0.39 [160] was used to remove adapter sequences, to eliminate leading and trailing low-quality reads with quality below 3 (LEADING:3, TRAILING:3), and to drop reads shorter than 36 nt (MINLEN:36). The IDs of all remaining reads were modified by a customized Python script [161] to enable the following assembly with Trinity [162]. Trinity v2.4.0 [162] was applied for de novo transcriptome assembly using the previous cleaned reads as input. Trinity was run with a k-mer length of 25. In order to validate the quality of the transcriptome assemblies, a summary of the assembly statistics was generated for each species. The assembly statistics were computed using a previously developed Python script [163] (Additional file 1: Table S3).

After completion of the assembly process, kallisto v0.44 [164] was run to quantify the abundances of transcripts based on all available RNA-Seq data of the respective species. Similarly, a principal components analysis (PCA) for every dataset was performed based on their transcriptomic profiles to compare the similarity between samples and to identify any outliers (Additional file 3: Figures S1-S4). PCA was performed using R v.4.1.3 [165] with the package ggplot2 v.3.4.0 [166] to inspect the variation within the data set (Additional file 3: Figures S1-S4).

Identification of candidate genes

In order to facilitate the identification of candidate genes, encoded peptide sequences were inferred from the transcriptome assembly using a previously established approach [167] that combines Transdecoder [168], ORFfinder [169], and ORFpredictor [170]. Knowledge-based Identification of Pathway Enzymes (KIPEs3) v0.34 [171, 172] was applied to identify the structural genes involved in the flavonoid biosynthesis. Flavonoid biosynthesis regulating R2R3-MYB genes were identified via MYB_annotator v0.3 [173]. A previously described BLAST-based Python script [174] was deployed for the identification of additional candidate genes using bHLH, WD40, and WRKY genes associated with the flavonoid biosynthesis as baits [161]. A complete list of the selected candidate genes can be found in Additional file 1: Tables S4-S7.

Phylogenetic trees were constructed to identify all isoforms that belong to the same gene. Isoforms of the same gene may differ by the presence or absence of exons, but they should not exhibit more sequence differences than those accounted for by sequencing errors. In a phylogenetic context, transcript isoforms should form a distinct monophyletic group that can be replaced by one representative sequence. Phylogenetic trees were constructed with FastTree v.2.1.11 [175] based on a MAFFT v7.475 alignment of polypeptide sequences using default parameters. Additional trees for comparison and additional support were constructed using IQ-TREE v.1.6.12 [176] using default parameters and RAxML v.8.2.12 [177] with PROTGAMMA + LG + F. Phylogenetic trees for the transcription factor families MYB, bHLH, TTG1, and WRKY were constructed to assess orthologues relationships (Additional file 3: Figures S8-S17). Sets of outgroup sequences were compiled based on reports in the literature in order to have a backbone of functionally characterized sequences for each tree. These sequences have been associated with the flavonoid biosynthesis in previous studies and were taken from plant species closely related to those explored with transcriptome assemblies.

Taxonomic distribution of analyzed species

A plastid-based phylogenomic tree was used to study the distribution of the variations-related cases across all flowering plant families. The backbone tree was taken from Li et al [178] and modified with iTOL v.6.8.1 [179] to highlight the represented orders and families in our dataset.

Gene expression analyses

Transcriptome assemblies usually generate a huge number of alternative transcript isoforms per gene. The initial sequences of the transcriptome assembly were used as reference for the quantification, but the transcript abundance (‘gene expression’) values were summarized per gene. Gene expression information of the entire monophyletic group was mapped to this representative transcript during the generation of heatmaps. Heatmaps displaying the candidate genes and their respective abundance as transcripts per million (TPM) were constructed using the R packages ComplexHeatmap v.2.10.0 [180], circlize 0.4.15 [181], and dplyr 1.1.0 [182]. Genes with an adjusted p-value < 0.05 and absolute log2 fold-change > 1 were considered as differentially expressed. The script used for the heatmap construction is available in our GitHub repository [161]. A workflow of the methods used in the transcriptome analysis is described in Fig. 5.

Fig. 5
figure 5

Flowchart representation of the methodology followed in the comparative transcriptional analysis. SRA – Sequence Read Archive, KIPEs – Knowledge-based Identification of Pathway Enzymes, MYB - Myeloblastosis, bHLH – basic helix-loop-helix, TTG1 – TRANSPARENT TESTA GLABRA1, WRKY – WRKY DNA-binding domain