Introduction

Plant-specific transcription factors (TFs) of the WRKY family are integral players in numerous vital biological processes, spanning growth and development, responses to biotic and abiotic stress, and the biosynthesis of secondary metabolites (Khoso et al. 2022; Rushton et al. 2010; Ulker and Somssich 2004; Wang et al. 2023). The WRKY domain, encompassing 60 highly conserved amino acids, features a distinctive WRKYGQK heptapeptide at the N-terminal end, while the C-terminal end harbors a C2H2 (CX4-5CX22-23HXH) or C2HC (CX7CX23HXC) zinc finger-like motif (Eulgem et al. 2000; Rushton et al. 2010). The classification of WRKY proteins is based on the number of WRKY domains and the type of zinc-finger motif. Specifically, group I WRKYs possess two WRKY domains, while groups II and III contain one such domain. Additionally, group II WRKYs feature a C2H2 zinc-finger motif, whereas group III WRKYs exhibit a C2HC motif. Further categorization within group II WRKY includes five subgroups (IIa, IIb, IIc, IId, and IIe). Despite the universal presence of a conserved WRKY domain among plant WRKYs and their interaction with a unique DNA-binding site (TTGACC/T, W-box), these TFs play diverse roles in various plant biological processes (Ulker and Somssich 2004; Wang et al. 2023).

In soybean (Glycine max), the expansion of the WRKY transcription factor family is influenced by duplication events, with 76.7% (102/133) of GmWRKYs identified as segmentally duplicated genes and 13.5% (18/133) as tandemly duplicated genes, underscoring the significant contribution of segmental duplication to the proliferation of GmWRKYs (Yin et al. 2013). Comparative analysis across 12 legumes has revealed that orthologous WRKYs evolve at a faster pace than paralogous WRKYs, undergoing purifying selection, as demonstrated in the past study (Song et al. 2018). The diploid ancestor peanuts possess 75–77 WRKY members, whereas the tetraploid progeny peanuts exhibit an expansion to 131–158 WRKY members, with the loss of old WRKY TFs and retention of new ones. This expansion is further shaped by distinct domestication processes affecting peanut WRKY evolution in cultivated varieties (Chen et al. 2023). In galegoid legumes, the WRKY repertoire varies, exemplified by 78 chickpea (Cicer arietinum) CaWRKYs and 98 barrel medic (Medicago truncatula) MtWRKYs. Tandem duplication events play a pivotal role in the expansion of MtWRKYs when compared to CaWRKYs (Kumar et al. 2016). Conversely, in adzuki bean (Vigna angularis) and mung bean (Vigna radiata), the expansion of WRKY is not significantly influenced by tandem and segmental duplication events, as observed in the 84 VaWRKYs and 85 VrWRKYs (Srivastava et al. 2018). The amplification of group III Oryza sativa WRKY (OsWRKY) genes is attributed to both tandem and segmental gene duplications, a phenomenon not as pronounced in Arabidopsis (Wu et al. 2005). Furthermore, evidence suggests that WRKY expansion predates the divergence of monocots and dicots (Wu et al. 2005). This trend is consistent with the concept that WRKY originated in early eukaryotes and underwent substantial expansion in plant genomes (Zhang and Wang 2005). Within specific plant genomes, distinct patterns of WRKY expansion emerge. In the tomato (Solanum lycopersicum) genome, 16% of SlWRKYs are tandemly duplicated genes among the 81 identified SlWRKYs (Huang et al. 2012). Similarly, in pineapple (Ananas comosus), the expansion of the 54 AcWRKYs is attributed to segmental duplication events (Xie et al. 2018). In maize (Zea Mays), 56.3% (72/128) of segmentally duplicated ZmWRKYs and 2.3% (3/128) of tandemly duplicated genes contribute to the diversification of the ZmWRKY family (Tang et al. 2021).

WRKY TFs play a crucial role in modulating flavonoid biosynthesis across various plant species. For instance, in red-fleshed apples, MdWRKY11 orchestrates the biosynthesis of anthocyanin by activating the expression of key genes such as Malus domestica MYB (v-mybavian myeloblastosis viral10 (MdMYB10), Malus domestica flavonoid 3-o-glucosyltransferase (MdUF3GT), and Malus domestica ELONGATED HYPOCOTYL 5 (MdHY5) (Liu et al. 2019). Similarly, in grapes, Vitis vinifera WRKY (VvWRKY26), a homolog of Arabidopsis thaliana TRANSPARENT TESTA GLABRA 2/Arabidopsis thaliana WRKY (AtTTG2/AtWRKY44), functions as a positive regulator in proanthocyanidin biosynthesis (Amato et al. 2019; Amato et al. 2017). In cotton, the group IIc Gossypium hirsutum WRKY (GhWRKYs) are implicated in promoting Gossypium hirsutum MAP kinase kinase 2 (GhMKK2), which in turn regulates pathogen-induced flavonoid biosynthesis, contributing to resistance against Fusarium oxysporum (Wang et al. 2022). However, the specific roles of duplicated WRKYs in influencing secondary metabolites remain unclear. Weighted gene co-expression network analysis (WGCNA) stands out as a powerful and extensively utilized method for identifying co-expressing genes or gene(s) associated with specific traits (Li et al. 2022; Qian et al. 2023). Leveraging WGCNA may provide valuable insights into the identification of candidate GsWRKYs that are intricately involved in the biosynthesis of schaftoside in Grona styracifolia.

The perennial subshrub herb G. styracifolia, previously known as Desmodium styracifolium, shares a close phylogenetic relationship with soybean. As a traditional Chinese medicinal herb, G. styracifolia has demonstrated positive effects in treating cholelithiasis or urolithiasis, primarily attributed to its bioactive ingredient, schaftoside. Notably, previous studies have underscored the diverse health benefits of schaftoside, including its protective role against cholesterol gallstone formation in lithogenic diet-induced C57BL/6 mouse model (Liu et al. 2017a). Schaftoside has also exhibited protective effects against non-alcoholic fatty liver induced by high-fat diet in mice (Liu et al. 2017b). Computational studies have suggested schaftoside as a natural inhibitor against human respiratory syncytial virus (Kant et al. 2018). Furthermore, molecular docking analyses have identified it among the top 10 bioactive components, screened from 318 phytochemicals, exhibiting potential bioactivity against COVID-19 (Joshi et al. 2020). Recent research has further confirmed that schaftoside inhibits 3CLpro and PLpro enzymes of the SARS-CoV-2 virus (Yi et al. 2022). Consequently, it emerges as a health-promoting component deserving exploration of its biosynthesis and regulation in G. styracifolia. The biosynthesis of schaftoside shares enzymes, including chalcone synthase (CHS), chalcone isomerase (CHI), and flavanone 2-hydroxylase (F2H), with the flavonoid pathway. Recent advancements have identified a two-step di-C-glycosylation biosynthetic pathway for the flavone-di-C-glycoside schaftoside, involving successive catalysis by two C-glycosyltransferases (CGT) (Wang et al. 2020). However, despite these insights into the biosynthetic pathway, the regulatory mechanisms governing schaftoside in G. styracifolia remain unclear.

In this study, we identified 82 of 102 duplicated GsWRKY genes in the G. styracifolia genome, encompassing 11 tandemly duplicated GsWRKYs. Among these, GsCGT demonstrated coexpression with 16 GsWRKYs, including GsWRKY95 and five pairs of segmentally duplicated GsWRKYs. Through rigorous validation using dual-luciferase assays (DLA) and yeast one-hybrid (Y1H) assays, we confirmed that GsWRKY95 activates the expression of GsCGT. The GsWRKY95- and GsCGT-coexpressing GsCHSs are also associated with 24 GsWRKYs, notably encompassing 11 pairs of segmentally duplicated Group II GsWRKYs. This association underscores the intricate and interconnected regulatory network governing the biosynthesis of schaftoside in G. styracifolia. Our comprehensive results show that this association underscores the intricate and interconnected regulatory network governing the biosynthesis of schaftoside in G. styracifolia. The involvement of segmentally duplicated Group II GsWRKYs in this co-expression network highlights their specific and integral role in orchestrating the complex processes associated with schaftoside production. This finding contributes to a nuanced understanding of the regulatory dynamics underlying the biosynthesis of this bioactive compound in G. styracifolia.

Materials and methods

Plant materials

The G. styracifolia specimens utilized in this study were cultivated at the South China Botanical Garden, Chinese Academy of Sciences (Guangzhou, P. R. China). Root, stem, and leaf tissues were systematically collected from 8 or 10-month-old G. styracifolia plants cultivated in the field. These specimens were designated as 10ML (10-month-old leaf), 10MR (10-month-old root), 10MS (10-month-old stem), and 8ML (10-month-old leaf). For light-treated samples, G. styracifolia seedlings were cultivated under a photoperiod featuring 16 h of light/8 h of darkness at a temperature of 23 ± 1°C, with an illumination intensity of 3000 lx, for 14 d following germination. Subsequently, a portion of the seedlings (designated as the control group, CK) continued to grow under the same condition for an additional 14 d, while another portion was subjected to high-light (HL) treatment for 14 d. Harvested root (HLR and CKR), stem (HLS and CKS), and leaf (HLL and CKL) samples from these seedlings were obtained for further analysis. The seedlings were cultivated in pots containing a mixture of soil components (peat soil:perlite:vermiculite = 3:1:1). Samples from each tissue were collected from a minimum of three individual plants, and each sample comprised at least three biological replicates. Immediately following collection, all samples were flash-frozen in liquid nitrogen and stored until RNA extraction. All collected samples were utilized for RNA-Seq transcriptomic analysis and/or real-time quantitative polymerase chain reaction (RT-qPCR) assays.

Identification of G. styracifolia WRKY family genes and their sequence features

The GsWRKY protein sequences were sourced from the National Genomics Database Center (NGDC, https://ngdc.cncb.ac.cn/?lang=zh) with accession number PRJCA016945. Arabidopsis AtWRKY protein sequences were obtained from the TAIR website (https://www.arabidopsis.org/). To identify WRKY domain-containing proteins, the Hidden Markov Model (HMM) WRKY domain PF03106 alignments document (Stockholm format) was retrieved from the Pfam protein database (http://pfam.xfam.org/family/PF03106#tabview=tab3). Using HMMER3.0 (http://eddylab.org/software/hmmer3/3.0/hmmer-3.0), the Stockholm file was converted to hmm, and hmmsearch was employed for preliminary WRKY protein identification with default parameters and a cutoff value of 0.01. Subsequently, all candidate WRKY protein sequences identified in the previous step were subjected to further confirmation of the WRKY domain’s presence using the program pfam_scan.pl, with a cutoff value of 1E-10. To validate corresponding core domains in the candidate WRKY proteins, NCBI Batch CD-search (https://www.ncbi.nlm.nih.gov/Structure/bwrpsb/bwrpsb.cgi), SMART (http://smart.embl.de/smart/set_mode.cgi?NORMAL=1), and iTAK-online (http://itak.feilab.net/cgi-bin/itak/online_itak.cgi) were employed. To refine the dataset, proteins with incomplete domains and redundancies were manually removed. Finally, the ExPASy website (https://web.expasy.org/protparam/) was utilized to calculate the sequence length, isoelectric points (pI), molecular weight (MW), and the Grand Average of Hydropathicity (GRAVY) for the confirmed GsWRKY proteins.

Phylogenetics, gene structure, motif, and cis-elements analysis

The subfamily classification of GsWRKYs was guided using AtWRKY protein sequences obtained from TAIR. Conserved domain sequences were extracted using Fasta Extract (Recommended) from TBtools (Chen et al. 2020). Multiple sequence alignments of the WRKY domains were carried out using ClustalW within MEGA 6.0 (http://www.megasoftware.net/mega6/) with default parameters. Phylogenetic trees were constructed utilizing the neighbor-joining (NJ) method in MEGA 6.0, employing the Poisson model, pairwise deletion, and 1000 bootstrap replications. Additionally, an NJ phylogenetic tree of the full length GsWRKY proteins was constructed using the same method. Information regarding the structure of introns and exons was obtained from the G. styracifolia genome database (GFF3 file). The identification of ten conserved motifs in GsWRKY protein sequences was conducted using MEME (http://alternate.meme-suite.org/tools/meme) with default parameters. PlantCARE (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/?tdsourcetag=s_pcqq_aiomsg) was employed to predict cis-elements within the 2000 bp promoter sequence of GsWRKYs. To provide an integrated view, the phylogenetic tree, gene structure, conserved motifs, and cis-elements of GsWRKYs were integrated using TBtools.

Chromosomal location and syntenic analyses for GsWRKY genes

The chromosome location information for each GsWRKY gene was extracted from the G. styracifolia genome annotation file (GFF3). The analysis of duplication events among GsWRKYs was conducted using MCScanX within TBtools, employing default parameters (Chen et al. 2020). To visualize the synteny relationships of orthologous WRKY genes, the Dual Systeny Plot software in TBtools was utilized. This analysis included selected species such as Oryza sativa, Arabidopsis thaliana, Cicer arietinum, Medicago truncatula, Lupinus albus, Lupinus angustifolius, and Glycine max (Chen et al. 2020). The ParaAT 2.0 script (Zhang et al. 2012) facilitated batch alignment procedures and conversion to AXT documents. Subsequently, the non-synonymous (Ka) and synonymous (Ks) substitution rates for each duplicated WRKY gene pair were calculated using Ka/Ks_Calculator with the MA method (Zhang 2022).

Heatmap clustering, coexpression analysis, and protein–protein interaction (PPI) interaction prediction

To provide an overview of the expression patterns of GsWRKYs, gene expression levels were calculated using the FPKM method, and heatmap clustering was conducted using TBtools (Chen et al. 2020). The RNA-Seq data were sourced from NGDC under accession number PRJCA016945, SRR11144541, and PRJNA936623. To determine the coexpression relationships between GsWRKYs and genes involved in schaftoside biosynthesis (including GsCHS, GsCHI, GsCGT, and GsF2H), the WGCNA method was employed with default parameters (Langfelder and Horvath 2008). Coexpression relationships with an R value exceeding 0.6 were visualized using Cytoscape V3.1 (https://github.com/cytoscape/cytoscape). For PPI networks, the GsWRKY homologs of Arabidopsis AtWRKY proteins were constructed using STRING (https://string-db.org/) with an option value > 0.4. The homologous proteins of the interactors identified in Arabidopsis were reciprocally determined in G. styracifolia through BLASTP analysis. The resulting network was then visualized using Cytoscape V3.1.

Real-time quantitative polymerase chain reaction (RT-qPCR)

To validate the expression levels of genes identified through RNA-Seq transcriptomes, RT-qPCR was employed for confirmation, utilizing GsCCRP4 as an internal reference gene, as established in a prior study (Wang et al. 2021). The primers used in this study are listed in Table S1. The selected genes for expression pattern comparison in roots, stems, and leaves included GsWRKY20, GsWRKY29, GsWRKY33, GsWRKY87, GsWRKY52, GsWRKY95, and GsCGT, as detected by RT-qPCR. Primers for RT-qPCR were designed using Primer 5, and the resulting PCR products were confirmed through electrophoresis to exhibit single band and dissociation curves with a single peak by RT-qPCR. The RT-qPCR reactions were conducted on the Roche LightCycler® 480 system (Roche, Switzerland) platform in a total reaction volume of 10 μL of SYBR Green PCR MasterMix. Following RNA integrity assessment and reverse transcription to cDNA, all samples were diluted to a concentration of 100 ng/μL. The RT-qPCR program comprised an 95°C for 10 min, followed by 40 cycles of 95°C for 10 s, 60°C for 10 s, and 72°C for 30 s. Relative quantification of gene expression was achieved after normalizing to ???GsCCRP4 expression as an internal control. The relative transcription levels were calculated using 2−ΔΔCT method, as previously described (Wang et al. 2021).

Dual-luciferase assay, yeast one hybrid and Luciferase Complementation Imaging (LCI) assays

To validate the binding and activation of GsWRKY95 on GsCGT, DLA and Y1H assays were employed. For the DLA, two W-box cis-elements (W1 and W2) in the GsCGT promoter were individually mutated, resulting in three mutants (W1M-W2, W1-W2M, and W1M-W2M) (Li et al. 2023). The wildtype (W1-W2) and mutated cis-element sequences of GsCGT were synthesized by Beijing Tsingke Biotech Co., Ltd. Subsequently, these sequences were inserted into the pGreenII 0800-LUC reporter vector. The pGreenII 62SK-GsWRKY95 was used as an effector. The recombinant vectors were then transformed into Agrobacterium strain GV3101 (pSoup-p19) and infiltrated into tobacco leaves. The resulting tobacco plants were transferred into darkness for 1 d after injection and kept under light conditions for 2 d. The Dual-Luciferase® Reporter Assay System Kit (Promega, E1910) was utilized to measure fluorescence activity (Fan et al. 2020). In the control group, comprised of pGreenII 62-SK empty vector with pGreenII 0800-proGsCGT-LUC, LUC/REN normalized ratio was calculated by determining the firefly luciferase chemiluminescence value relative to the Renilla luciferase chemiluminescence, as previously outlined (Hellens et al. 2005). Statistical analysis was conducted using a t-test. For multiple comparisons, SPSS software was employed, applying the Duncan (D) statistical method with a significance level (P) set at 0.05.

In the Y1H assay, given that GsWRKY95 activates the W1 box of the GsCGT promoter, the W1 box-containing promoter (proGsCGT-W1) was integrated into the pBait-AbAi vector. The full-length coding sequence (CDS) of GsWRKY95 was ligated into the pGADT7 vector, utilizing XhoI and KpnI restriction sites, employing SoSoo Mix (Beijing Tsingke Biotech Co., Ltd). The resulting recombinant vectors were co-transformed into the Y1H gold stain, which was cultivated on SD/ − Leu medium for 3 d (Fan et al. 2020). Subsequently, yeast growth under varying concentrations of Aureobasidin A (AbA) was assessed. The yeast cells were then transferred to SD/ − Leu medium supplemented with 0, 100, 200, 400, 600, 800, and 1000 (ng/mL) AbA for the final screening of the interaction.

The interaction between GsWRKY33 and GsWRKY95 was validated using the LUC Complementation Imaging (LCI) technique. The full-length CDS of GsWRKY95, excluding the stop codon, was cloned into the KpnI-SalI sites of the pCAMBIA1300-NLUC vector, enabling fusion with the N-terminus of the LUC fragment under the control of the 35S promoter. For the N-terminal sequences, the stop codon of GsWRKY95 was fused. Simultaneously, the full-length CDS of GsWRKY33 was cloned into the KpnI-SalI sites of the pCAMBIA1300-CLUC vector, facilitating fusion with the C-terminus of the LUC fragment under the control of the 35S promoter. LCI assays were conducted following protocols outlined in previous studies (Luo et al. 2021; Sun et al. 2017). Briefly, the constructed vectors were transformed into Agrobacterium strain GV3101-pSoup-p19, followed by overnight incubation at 28°C with continuous shaking. The resulting bacterial suspensions were adjusted to a final OD600 of 0.5 (v:v = 1:1) using infiltration buffer. Subsequently, these suspensions containing various construct combinations were infiltrated into tobacco leaves. Following infiltration, the plants were kept in the dark for 24–48 h. Finally, the tobacco leaves were excised, treated with a 1 mmol/L luciferin spray, and the luminescence signal was captured using a chemiluminescence imaging apparatus (NightSHADE LB 985, German). The primers utilized in this study are listed in Table S1.

Results

Identification and characterization of GsWRKY gene family

In this study,  a total of 102 GsWRKY members was identified in the G. styracifolia genome, designated as GsWRKY1 to GsWRKY102 based on their chromosomal distribution (refer to Table S2). The ExPASY software facilitated the analysis of essential features such as amino acid length, MW, pI, and GRAVY for all WRKY TFs. As detailed in Table S2, the lengths of these 102 GsWRKY proteins range from 139 to 758 aa, encompassing an MW spectrum of 16.53–83.92 kDa. Furthermore, the hydrophilic values of these proteins span from − 1.315 to − 0.217, signifying their characteristic as hydrophilic proteins.

Phylogenetic analysis of GsWRKY TFs

To elucidate the evolutionary relationships among GsWRKYs, we constructed a phylogenetic tree utilizing 102 GsWRKYs along with 72 AtWRKYs from Arabidopsis thaliana (Fig. 1). As depicted in Fig. 1, GsWRKYs exhibit a tripartite classification into Group I, II, and III, a categorization consistent with findings in rice, Arabidopsis, and tomato (Huang et al. 2012; Wu et al. 2005). Group I GsWRKYs possess two WRKY domains, namely IN and IC WRKY domain, and are clustered into distinct clades within the phylogenetic tree, aligning with amino acid sequence similarities (Fig. S1). Among the 68 Group II GsWRKYs, further subdivision into IIa (8), IIb (17), IIc (21), IId (11), and IIe (11) subgroups is evident. The 16 members of Group III contain a single WRKY conserved domain, with their C2H2 zinc finger structure differing from that of other two groups.

Fig. 1
figure 1

Phylogenetic tree of GsWRKYs and AtWRKYs. GsWRKYs, Grona styracifolia WRKYs; AtWRKYs, Arabidopsis thaliana WRKYs

Identification of conserved motifs in GsWRKY TFs

To characterize the sequence features, MEME software was employed to identify conserved motifs in GsWRKYs (Fig. S2). As detailed in Table S3, a total of 10 motifs were identified in GsWRKYs, with motif 1–10 denoted as M1–M10. Generally, GsWRKYs within the same branch exhibit similar motif composition and patterns. The presence of conserved motifs and their distribution within the same clade of GsWRKYs suggest potential shared conserved gene functions. Additionally, the lineage-specific conserved motifs may contribute to distinct functions in these GsWRKYs. Notably, M3 is unique to Group I, while M5 and M7 are specific to Group IIa/b. Conserved motifs M1 and M2 are prevalent across all GsWRKY members, with M1 containing the conserved WRKYGQK heptapeptide at the N-terminal end in all GsWRKY members, except for GsWRKY6.

Genomic characterization of GsWRKYs

A previous study has proposed that ancestral WRKY genes, featuring two WRKY domains, underwent multiple rounds of duplication events during plant evolution, resulting in WRKY gene expansion (Zhang and Wang 2005). Notably, in soybeans, the WRKY gene family expanded due to segmental duplication events (Yin et al. 2013). In order to explore the genomic evolutionary events contributing to the abundance of GsWRKY members, we conducted a genomic characterization of GsWRKYs. As illustrated in Fig. 2, the distribution of the 102 GsWRKYs is uneven across the 11 chromosomes (Chr) of G. styracifolia. Chr1 harbors the largest number of GsWRKYs (19), while Chr7 contains only three GsWRKYs. Intriguingly, 11 tandemly duplicated GsWRKYs are located on Chr2 (2), Chr5 (7), and Chr9 (2) (Fig. 2). Additionally, whole-genome duplication analysis reveals that 81 pairs of segmentally duplicated GsWRKYs, encompassing 73 GsWRKYs, underwent segmental duplication, with the majority located on Chr1, Chr2, and Chr3 (Fig. 3). These duplication events may contribute to the uneven distribution of GsWRKYs among chromosomes and potentially play a role in regulating the biosynthesis of schaftoside in G. styracifolia.

Fig. 2
figure 2

Chromosomal distribution of Grona styracifolia WRKYs (GsWRKYs). The red line indicates tandemly duplicated GsWRKYs

Fig. 3
figure 3

Interchromosomal relationship analysis of Grona styracifolia WRKYs (GsWRKYs). The broken line graph represents gene distribution on all the positive chains, and the heatmap represents gene distribution on all negative chains. The gray line represents all synteny blocks in G. styracifolia genome, and the blue line represents segmentally duplicated pairs of GsWRKY genes

The collinear analysis of GsWRKYs with WRKYs from seven representative plants was conducted to elucidate the evolutionary history of WRKY genes (Fig. S3). In total, we identified 90 pairs of GsWRKYs showing synteny with WRKYs from other plants. Among them, 39 and 95 pairs of GsWRKYs exhibited collinearity with diploid rice and Arabidopsis, respectively. Notably, the diploid galegoid clade legume species C. arietinum and M. Truncatula displayed 133 and 155 pairs of WRKYs collinear with G. styracifolia, respectively. Furthermore, G. styracifolia GsWRKYs showed collinearity with 149 and 408 pairs of WRKYs in diploid genistoid clade legume species L. Angustifolius and tetraploid millettioid clade legume species G. max, respectively. These collinear results align with the phylogenetic relationships between G. styracifolia and the seven representative plants.

The Ka/Ks ratio serves as an indicator of the selection pressure acting on a gene during its evolutionary history. Ka/Ks ratios <1 or >1 suggest negative or positive selection, respectively, while a ratio of 1 signifies neutral evolution. In our study, the Ka/Ks ratios of duplicated GsWRKYs resulting from both tandem and segment duplication events are consistently less than 1 (Fig. S4). This finding implies that these duplicated GsWRKYs have undergone negative purifying selection (Fig. S4). Furthermore, the Ka/Ks ratios of syntenic WRKYs pairs between G. styracifolia and other seven species are also less than 1. This observation indicates that plant WRKYs are subjected to negative purifying selection, emphasizing the evolutionarily conservation of their functions (Song et al. 2018).

Identification of candidate GsWRKYs regulating schaftoside biosynthesis by WGCNA

Utilizing the RNA-Seq transcriptomic dataset from roots, stems, and leaves subjected to various age conditions or treatments, WGCNA was performed to identify potential GsWRKYs involved in the regulation of the schaftoside biosynthetic gene GsCGT. The co-expression network revealed that GsCGT coexpresses with key genes in the schaftoside biosynthetic pathway, including three GsCHSs (CHSGs01G10400, CHSGs01G10390, and CHSGs01G10320), two GsCHIs (CHIGs04G01000 and CHIGs04G00980), and two GsF2Hs (F2HGs07G12500 and F2HGs07G12490), supporting its involvement in schaftoside biosynthesis in G. styracifolia. Notably, GsCGT also exhibits coexpression with specific GsWRKYs. Two Group I GsWRKYs, namely GsWRKY44(I) and GsWRKY95(I), along with 14 Group II GsWRKYs, namely GsWRKY7(IIe), GsWRKY9(IIe), GsWRKY21(IIe), GsWRKY27(IIb), GsWRKY39(IIb), GsWRKY49(IIe), GsWRKY53(IIb), GsWRKY62(IIe), GsWRKY66(IIb), GsWRKY70(IIc), GsWRKY74(IIb), GsWRKY90(IId), GsWRKY93(IIb), and GsWRKY98(IIe), show coexpression with GsCGT. The pair of segmentally duplicated GsWRKYa and GsWRKYb is hereafter designated as GsWRKYa-b. Strikingly, among these, nine GsWRKYs form five pairs of segmentally duplicated genes, namely GsWRKY7-62, GsWRKY9-21, GsWRKY66-74, GsWRKY27-93, and GsWRKY74-93. Among the segmentally duplicated GsWRKYs, there is a prominent diversity in terms of their coexpression with GsCGT. Interestingly, GsWRKY95, a segmentally duplicated member, displays a broad coexpression profile, encompassing two GsC4Hs, eight GsCHSs, three GsCHIs, and four GsF2Hs. This extensive coexpression pattern strongly implies that GsWRKY95 plays a pivotal role in modulating the biosynthesis of schaftoside in G. styracifolia to a large extent. Conversely, all 11 tandemly duplicated GsWRKYs fail to exhibit coexpression with GsCGT, as illustrated in Fig. 4. This suggests a diminished contribution of tandemly duplicated GsWRKYs to the regulation of schaftoside biosynthesis in G. styracifolia.

Fig. 4
figure 4

Network of Grona styracifolia WRKY (GsWRKY) involved in the biosynthesis of schaftoside in G. styracifolia. (a) The network of Grona styracifolia CGT (GsCGT) coexpressing with GsWRKYs and GsWRKY95 associating with schaftoside biosynthetic genes. (b) The regulatory network of segmentally duplicated and GsCGT-coexpressing GsWRKYs identified in (a). (c) The regulatory network of GsWRKYs coexpressing with Grona styracifolia chalcone synthase (GsCHSs) and/or Grona styracifolia flavanone 2-hydroxylase (GsF2Hs) identified in (a). The correlationships among genes/nodes with R>0.6 are presented. Round rectangle, V, and triangle nodes indicate Group I, II, and III GsWRKYs. Hexagon nodes indicate schaftoside biosynthetic genes. For Group II GsWRKYs, the same colors, except for green, indicate segmentally duplicated GsWRKYs. Orange Group I GsWRKYs are segmentally duplicated GsWRKYs

In the context of coexpressing with GsCGTs, segmentally duplicated GsWRKY7-62, GsWRKY9-21, and GsWRKY66-74 underwent further investigation to determine whether they exhibit coexpression with other GsWRKYs, potentially participating in a hierarchical orchestration of schaftoside biosynthesis. As depicted in Fig. 4b, these segmentally duplicated GsWRKYs coexpress with a total of 33 GsWRKYs, forming two distinct modules. Among them, 21 GsWRKYs form 14 pairs of segmentally duplicated GsWRKYs. Notably, GsWRKY26-90, GsWRKY39-83, and GsWRKY42-52 exhibit coordinated expression with GsWRKY7-62 and GsWRKY66-74 in the GsWRKY21 module. Concurrently, GsWRKY9 demonstrates coexpression with GsWRKY8-84 and GsWRKY16-25, delineating another module. This intricate pattern of coexpression suggests a hierarchical and diverse regulatory framework orchestrated by GsWRKYs in the biosynthesis of schaftoside. Given that CHS is well-known rate-limiting enzyme in flavonoid, including schaftoside, the GsCGT-coexpressing GsCHSs were employed as bait to uncover GsWRKYs coexpressing with this biosynthetic pathway. As illustrated in Fig. 4c, a total of 24 GsWRKYs, encompassing three Group I and 21 Group II GsWRKYs, demonstrate coexpression with the three GsCHSs. Among them, 15 GsWRKYs form 13 pairs of segmentally duplicated GsWRKYs coexpressing with GsCHSs. These findings indicate that GsWRKYs, particularly those from segmentally duplicated Group II GsWRKYs, play a pivotal role in the regulation of schaftoside biosynthesis.

Spatiotemporal expression of GsWRKYs

To elucidate the role of GsWRKYs in various tissues, an analysis of the spatiotemporal expression patterns of GsWRKYs in roots, stems, leaves, and flowers was conducted. As shown in Fig. 5a, the hierarchical expression pattern is categorized into two subclades, with a distinction based on predominant expression in roots or other tissues. Notably, the number of GsWRKYs exhibiting abundant expression in roots and/or stems surpasses those in flowers and leaves. Interestingly, the clustering of tandemly duplicated GsWRKYs located on Chr2 (GsWRKY34 and GsWRKY35) and Chr9 (GsWRKY88 and GsWRKY89) within the same clade indicates abundant expression in stems and leaves (Fig. 5a). This suggests potential redundancy in the spatiotemporal expression of the four GsWRKYs homologous to AtWRKY40. The seven tandemly duplicated GsWRKYs on Chr5 exhibit functional diversity, with GsWRKY59 clustering with GsCGT and GsWRKY56 clustering with GsCGT-coexpressing GsWRKY95. These findings highlight the functional divergence within the seven tandemly duplicated GsWRKYs on Chr5. GsCGT, homologous to a previously identified CGT enzyme responsible for schaftoside biosynthesis (Wang et al. 2020), is also confirmed to play a role in G. styracifolia (Ding et al., unpublished data). Importantly, GsCGT-coexpressing GsWRKY9, GsWRKY27, and GsWRKY93, along with GsCGT, predominantly express in roots and stems, which constitute a larger proportion of biomass compared to leaves. This suggests that these three GsWRKYs are promising candidate genes for genetic improvement efforts.

Fig. 5
figure 5

Spatiotemporal expression profile of Grona styracifolia WRKY (GsWRKY). (a) The spatial expression profile of GsWRKYs. (b) The hierarchical heatmap of GsWRKYs spatially expressing in 10-month-old seedlings compared to 28 d seedlings. (c) The hierarchical heatmap of GsWRKYs spatially expressing in 28 d seedlings with high-light treatment compared to those in control conditions. Grona styracifolia CGT (GsCGT) and its positive regulator GsWRKY95 are labeled in red. Tandemly duplicated GsWRKY are denoted in green. Red arrows indicate GsCGT-coexpressing GsWRKYs presented in Fig. 5a. Blue arrows denote the TTG2-like GsWRKYs

Age- and light-regulated GsWRKYs

The regulatory mechanisms governing the biosynthesis of bioactive components in medicinal plants involve both external signals, such as light, and internal signals, such as development stage (age) (Liu et al. 2004; Ouyang et al. 2016). To investigate the role of GsWRKYs in modulating schaftoside biosynthesis, the effects of HL and age on the expression of GsWRKYs and GsCGT were examined. As depicted in Fig. 5b, the expression of GsCGT in stems is upregulated by age, and GsCGT clusters with GsCGT-coexpressing GsWRKY62 and GsWRKY93. This suggests that these genes might be involved in age-mediated biosynthesis of schaftoside in stems. As illustrate in Fig. 5c, the expression of GsCGT in leaves is upregulated by HL treatment, and GsCGT clusters with GsCGT-coexpressing GsWRKY21, GsWRKY44, and GsWRKY98. This implies that these three GsWRKYs may participate in HL-mediated biosynthesis of schaftoside in leaves. Notably, the upregulation of GsWRKY95 expression in roots by both age and HL indicates that GsWRKY95-GsCGT module orchestrates the biosynthesis of schaftoside in roots in response to age or HL induction. Moreover, GsCGT-coexpressing GsWRKY74 also clusters with the GsCGT regulator GsWRKY95, indicating a substantial contribution of GsWRKY74 to age- or HL-mediated biosynthesis of schaftoside in roots. Additionally, all 11 tandemly duplicated GsWRKYs exhibit distinct responses to age and HL treatment compared to their homologous counterpart (s), indicating subfunctionalization during evolutionary history.

Functional divergence of segmentally duplicated GsWRKYs

To elucidate the functional divergence of duplicated GsWRKYs in the regulation of schaftoside biosynthesis, a hierarchical expression profile of GsWRKYs across different tissues was constructed. Figure S5 illustrates that the expression patterns of GsCGT and six other GsWRKYs, as detected by qPCR analysis, are consistent with the RNA-Seq data, affirming the reliability of the RNA-Seq data. Given the focus on segmentally duplicated GsWRKYs and GsCGT-coexpressing GsWRKYs, the expression patterns reveal intriguing insights. As depicted in Fig. 6, GsCGT-coexpressing GsWRKYs, such as GsWRKY15, GsWRKY44, GsWRKY95, and GsWRKY7-62, cluster closely with GsCGT. The hierarchical expression pattern of GsWRKYs indicates that certain segmentally duplicated GsWRKY members exhibit diversity. For instance, in the case of GsWRKY44-85, while GsWRKY44 clusters with GsCGT, GsWRKY85 clusters with F2HGs01g18240 in a different clade. Similarly, GsWRKY15-27 show variability, with GsCGT clustering with GsWRKY15 but not GsWRKY27. Although GsWRKY9-21 coexpress with GsCGT, they do not group with the GsCGT cluster. Notably, GsWRKY7-62, which coexpress with GsCGT, are grouped in the same clade. Furthermore, the segmentally duplicated pair GsWRKY74-15 and GsWRKY74-11, while belonging to the same duplicated set, are grouped in different clades. Interestingly, GsWRKY74, but not GsWRKY11 and GsWRKY15, coexpresses with GsCGT. These findings suggest that segmentally duplicated GsWRKYs exhibit functional diversity and may play tissue-specific roles in regulating the expression of GsCGT.

Fig. 6
figure 6

Heatmap of hierarchical expression profile of Grona styracifolia WRKY (GsWRKY) and schaftoside biosynthetic genes. Arrows indicate GsWRKYs coexpressing with Grona styracifolia CGT (GsCGT) as presented in Fig. 5. Circles indicate segmental duplication GsWRKYs coexpressing GsCGT. Triangles indicate tandem duplication GsWRKYs. The same color shows that they are paired segment duplication GsWRKYs

Validation of GsWRKY95 regulating schaftoside biosynthesis

To affirm the regulatory role of GsWRKY95 in GsCGT-mediated schaftoside biosynthesis, Y1H and DLA assays were conducted to investigate the binding of GsWRKY95 to the GsCGT promoter and its activation of GsCGT expression, respectively (Fig. 7). Promoter analysis revealed that GsCGT promoter contains two W boxes, namely W1 and W2 box (Fig. 7a). To pinpoint the specific W box targeted by GsWRKY95, constructs with W-box mutations were designed (Fig. 7b and c). As illustrated in Fig. 7d, GsWRKY95 effectively activates the expression of GsCGT with a 1177 bp promoter sequence. Further investigation into the binding of GsWRKY95 to W1 and/or W2 was performed through W-box mutation analysis employing a DLA assay (Fig. 7e). These results suggest that GsWRKY95 activates GsCGT expression by targeting the W1 box. This finding was corroborated by a Y1H assay, confirming the physical binding of GsWRKY95 to the W1 box of the GsCGT promoter (Fig. 7f and g). In summary, these results provide robust evidence that GsWRKY95 physically binds to the W1 box of the GsCGT promoter and activates GsCGT expression.

Fig. 7
figure 7

Group Ic GsWRKY95 activating the expression of Grona styracifolia CGT (GsCGT). (a) WRKY binding cis-element W-box distributed in the promoter of GsCGT. (b), (c) The constructs used for dual-luciferase assay (DLA). (d), (e) The transcriptional activity of Grona styracifolia WRKYs (GsWRKYs) on W1-W2 box and their mutants detected by DLA (n = 3, different letters indicating significant difference at P<0.05 level). (f) The diagram of synthesized proGsCGT-W1 fragment containing only W1-box. (g) Yeast one-hybrid (Y1H) assay uncover GsWRKY95 physically binding proGsCGT-W1. AD, pGADT7-AD vector; AD-GsWRKY95, pGADT7-GsWRKY95; SD/-Leu + AbA* represents the growth at maximum inhibition concentration

PPI and promoter analysis of GsWRKYs regulating schaftoside biosynthesis

Given that WRKY TFs typically bind DNA in a dimeric manner, predicting interactions among GsWRKY members becomes crucial. Figure 4 highlights the direct and indirect involvement of 42 GsWRKYs, homologous to 29 AtWRKYs, in the schaftoside biosynthesis, with 22 GsWRKYs, namely GsWRKY44 (Group I), GsWRKY60 (Group III), and 20 Group II GsWRKYs, interacting with each other, as predicted by the STRING database (Fig. 8). Notably, Group II GsWRKYs, particularly Group IIb GsWRKYs, seem to play pivotal roles in forming dimers that regulate schaftoside biosynthesis, either directly or indirectly. In Fig. 8a, the interaction between GsWRKY44 and GsWRKY87 is predicted. Interestingly, LCI assays reveal that GsWRKY95, homologous to GsWRKY44, associates with GsWRKY33, homologous to GsWRKY87 (Figs. 1 and 9). These findings indicate that these GsWRKYs may form dimers, either directly and/or indirectly, contributing to the modulation of schaftoside biosynthesis.

Fig. 8
figure 8

Protein–protein interaction (PPI) of Grona styracifolia WRKYs (GsWRKYs) corresponding to Arabidopsis thaliana WRKYs (AtWRKYs) orthologs predicted by STRING search tool. Potential PPI between GsWRKY members (a) and of GsWRKYs with other proteins (b). The node size is proportional to the number of interacted proteins

Fig. 9
figure 9

Grona styracifolia WRKY95 (GsWRKY95) interacts with GsWRKY33. (a) Schematic representation of two constructs used in the luciferase complementation imaging (LCI) assays. (b) LCI assays indicating the interaction between GsWRKY33 and GsWRKY95 in tobacco leaves

In Fig. 8b, the interaction predictions reveal that MPK3 and MPK4 interact with several GsWRKYs, suggesting that MPK3/4-mediated phosphorylation of GsWRKYs, including GsWRKY44, may be implicated in schaftoside biosynthesis. Additionally, GsWRKY34 is predicted to associate with the chlorophyll biosynthesis module, including Arabidopsis thaliana GENOMES UNCOUPLED 5 (AtGUN5), AtGUN4, AtCHL1/2, and AtCHLM. Notably, the promoters of the 42 GsWRKYs directly or indirectly related to schaftoside biosynthesis contain light responsive cis-elements, such as Box4 (ATTAAT) and G-box (Fig. S6). Furthermore, AtWRKY25, known to respond to various abiotic stresses, especially salt stress, is predicted to associate with GsWRKYs. These findings suggest that GsWRKYs may play a role in mediating abiotic stresses, particularly light stress, to orchestrate schaftoside biosynthesis. As illustrated in Fig. S6, the promoter analysis reveals the presence of cis-elements responding to MeJA (CGTCA motif and TGACG motif), ABA (ABRE motif), and the WRKY (W-box) in the promoters of the 42 GsWRKYs. This pattern is consistent with the promoter configuration of Eucommia ulmoides WRKYs (EuWRKYs) (Liu et al. 2021). Interestingly, the WRKY network include four TGA TFs (TGA1, TGA3, TGA4, and TGA6), essential for N-hydroxypipecolic acid-induced systemic acquired resistance through transcriptional reprogramming (Yildiz et al. 2023). These insights suggest that GsWRKYs responding to both abiotic and biotic stresses play crucial roles in the biosynthesis of schaftoside in G. styracifolia, contributing to its adaptation to environmental stresses.

Discussion

Duplicated GsWRKYs contribute to schaftoside biosynthesis

Herein, a comprehensive set of 102 GsWRKYs was successfully identified within the G. styracifolia genome, characterized and categorized into Group I, II, and III based on both the phylogenetic tree and conserved domain analysis. The observed number of GsWRKYs slightly surpasses that found in other millettioid clade legume plants, with the exception of soybean (Song et al. 2018; Yang et al. 2017). This discrepancy in numbers may be attributed to a recent genome duplication event documented in soybean (Yin et al. 2013). Conversely, the number of GsWRKYs closely aligns with that observed in the genistoid legume plant L. angustifolius (Song et al. 2018). Within these groups, there are 12 (Group I), 54 (Group II), and 16 (Group III) duplicated GsWRKYs members, as detailed in Tables S4 and S5. Specifically, 11 and 71 GsWRKYs arise from tandemly and segmentally duplicated genes, respectively. Prior studies have consistently reported that such duplication events play a pivotal role in the expansion of the WRKY gene family (Gao et al. 2023; Yang et al. 2017; Yin et al. 2013). Given this context, it is imperative to explore the potential contribution of duplicated GsWRKYs to the biosynthesis of schaftoside in G. styracifolia.

Remarkably, GsWRKY34 and GsWRKY88 are derived from both tandem and segment duplication events. As illustrated in Fig. 4, GsCGT does not exhibit coexpression with the 11 tandemly duplicated GsWRKYs. In contrast, the results from WGCNA suggest that the 17 pairs of segmentally duplicated GsWRKYs are intricately connected, either directly or indirectly, with the biosynthesis of schaftoside (Fig. 4). This intriguing finding implies that segmentally duplicated GsWRKYs might exert a more substantial influence on schaftoside biosynthesis compared to their tandemly duplicated counterparts. Delving into the specific roles of the 17 pairs of segmentally duplicated GsWRKYs, it is noteworthy that GsWRKY16-25 shares homology with AtWRKY32, a regulator known to promote photomorphogenesis by activating Arabidopsis thaliana ELONGATED HYPOCOTYL 5 (AtHY5) expression (Zhou et al. 2022). Additionally, GsWRKY4-32 exhibit homology with OsWRKY13 in rice, where OsWRKY13 plays a pivotal role in mediating cross talk between abiotic and biotic stress signaling pathways by selectively binding to different cis elements (Xiao et al. 2013). Overall, it becomes evident that environment factors play a crucial role in orchestrating schaftoside biosynthesis, a process mediated by the duplicated GsWRKYs in G. styracifolia. This aligns with the broader perspective that gene duplication events contribute to genomic adaptation in response to a changing environment (Kondrashov 2012).

Segmentally duplicated GsWRKYs coexpressing GsCGT

Notably, GsCGT exhibits coexpression with eight GsWRKYs forming five pairs of segmentally duplicated genes, exemplified by GsWRKY7-62, GsWRKY9-21, and GsWRKY66-74. Interestingly, GsWRKY7-62 and GsWRKY9-21 share homology with four AtWRKYs implicated in ABNORMAL THERMOMORPHOGENESIS (ABT); GsWKRY21, GsWRKY9, GsWRKY7, and GsWRKY62 are homologous to AtWRKY14/ABT1, AtWRKY35/ABT2, AtWRKY65/ABT3, and AtWRKY69/ABT4, respectively. In Arabidopsis, transgenic plants overexpressing ABT1 or its close homologs display insensitivity to high temperatures, while the quadruple mutant exhibits heightened sensitivity to elevated temperatures (Qin et al. 2022). These findings strongly indicate that thermomorphogenesis might play a pivotal role in orchestrating the biosynthesis of schaftoside in G. styracifolia, a plant widely distributed in subtropical South China. Furthermore, GsWRKY66-74, homologous to AtWRKY72, are known to play a conserved role in basal defense against root-knot nematodes (Bhattarai et al. 2010). These observations highlight the involvement of both biotic and abiotic stress (temperature) in modulating schaftoside biosynthesis in G. styracifolia. Additionally, GsWRKY7-62 and GsWRKY66-74 exhibit coexpression with two pairs of segmentally duplicated Group II GsWRKYs (GsWRKY15-27 and GsWRKY26-90) and GsWRKY21, while they do not coexpress with segmentally duplicated counterpart of GsWRKY21, i.e., GsWRKY9. Furthermore, GsWRKY9 is associated with segmentally duplicated GsWRKYs, including GsWRKY16-25 (Group I) and GsWRKY8-84 (Group II). These findings underscore the intricate network formed by these segmentally duplicated GsWRKYs, suggesting their role as hierarchical positive regulators orchestrating the expression of GsCGT. A recent study also indicates that multiple Toona sinensis WRKY (TsWRKYs) form a core regulatory network, co-expressing with terpenoid structural genes, to modulate terpene synthesis in Toona sinensis (Ren et al. 2023).

Segmentally duplicated GsWRKYs coexpressing GsCHSs

The pivotal role of CHS as a rate-limiting enzyme in flavonoid biosynthesis is well-established. In this study, 15 GsWRKYs, forming 11 pairs of segmentally duplicated GsWRKYs, exhibit coexpression with three GsCGT-coexpressing GsCHSs. Notably, GsWRKY15-27, homologous to AtWRKY61, have been shown to enhance Arabidopsis plant resistance to turnip crinkle virus (Gao et al. 2016). GsWRKY26-90¸ homologous to AtWRKY7, encode a Ca-dependent calmodulin binding protein and function as a modulator of the bZIP28 branch of the unfolded protein response during PAMP-triggered immunity (Arraño-Salinas et al. 2018). GsWRKY39-83, homologous to AtWRKY6, regulate PHOSPHATE1 (PHO1) expression in response to low phosphate (Pi) stress and serve as an integration node in multiple leaf senescence signaling pathways (Chen et al. 2009; Zhang et al. 2021). Collectively, these findings suggest that the expansion of segmentally duplicated GsWRKYs, responsive to abiotic and biotic stress, plays a crucial role in determining the metabolic flux directed towards the schaftoside biosynthetic pathway in G. styracifolia. Furthermore, segmentally duplicated GsWRKYs, particularly those belonging to Group II, emerge as hierarchically positive regulators of GsCHSs, thereby robustly activating schaftoside biosynthesis.

GsWRKY95 modulates schaftoside biosynthesis

Typically, TTG2 homologs are well-conserved regulators of flavonoid biosynthesis across various species (Amato et al. 2017; Guo et al. 2023). As illustrated in Fig. 1, GsWRKY37, GsWRKY76, and GsWRKY79 are identified as homologs of Arabidopsis TTG2/AtWRKY44. However, our analysis reveals that these three GsWRKYs do not coexpress with GsCGT and GsCHSs, indicating that they may not play a regulatory role in the biosynthesis of schaftoside in G. styracifolia, at least under the defined coexpression cutoff (R>0.6). Contrastingly, our study highlights GsWRKY95 as a key regulator activating the expression of GsCGT. The homolog of GsWRKY95 in Arabidopsis, AtWRKY4, has been associated with salt stress tolerance and promotion of root growth under salt stress conditions (Li et al. 2021). This positions GsWRKY95 as a promising candidate gene for the improvement of G. styracifolia, owing to its positive roles in regulating root growth and schaftoside biosynthesis. Furthermore, GsCGT-coexpressing GsWRKY44, the expression pattern of which clusters with that of GsWRKY95 and GsCGT, is homologous to AtWRKY33, a gene annotated by TAIR to be involved in resistance to pathogens and abiotic stresses, particularly salt stress. Notably, the content of schaftoside (1.8 fold) and its isoform isoschaftoside (1.7 fold) exhibited a significant increase in rice varieties resistant to brown planthoppers when compared to susceptible varieties (Uawisetwathana et al. 2019). Given the central role of GsWRKY44 as a hub gene in the WRKY regulatory network, as depicted in Figs. 4 and 8, it is reasonable to speculate that GsWRKY44 and GsWRKY95, derived from the same phylogenetic clade, reinforce stress-mediated schaftoside biosynthesis in G. styracifolia in response to various stresses. This phenomenon is reminiscent of observations in cotton, where resistance to F. oxysporum is mediated through several group IIc GhWRKYs activating GhMKK2-mediated flavonoid biosynthesis (Wang et al. 2022).

Conclusions

In this comprehensive study, we identified 82 of 102 GsWRKYs as duplicated genes within the G. styracifolia genome. Among these duplications, 71 segmentally duplicated GsWRKYs and 11 tandemly duplicated GsWRKYs were found to be distributed across Chr 2, 5, and 9. Our coexpression analysis revealed that 16 GsWRKYs, including Group I GsWRKY95 and five pairs of duplicated Group II GsWRKYs (GsWRKY7-62, GsWRKY9-21, GsWRKY66-74, GsWRKY27-93, and GsWRKY74-93), exhibit coexpression with GsCGT, a key player in the biosynthesis of schaftoside in G. styracifolia. Moreover, GsCGT-coexpressing GsWRKY95 was further confirmed through DLA and Y1H assays, establishing GsWRKY95 as an activator of GsCGT expression. Furthermore, our investigation extended to the coexpression of three GsWRKY95- and GsCGT-coexpressing GsCHSs, revealing their association with 24 GsWRKYs, including 11 pairs of segmentally duplicated Group II GsWRKYs. These findings emphasize the hierarchical regulatory role of GsWRKYs, particularly segmentally duplicated Group II GsWRKYs, in the complex orchestration of schaftoside biosynthesis in G. styracifolia. Notably, our study unveiled tissue-specific regulatory patterns among several GsCGT-coexpressing GsWRKYs. For instance, GsWRKY9, GsWRKY27, and GsWRKY93 demonstrated tissue-specific modulation in roots and stems, GsWRKY62 and GsWRKY93 exhibited changes in expression with age in stems, and GsWRKY21, GsWRKY44, and GsWRKY98 exhibited responses to HL in leaves. Additionally, GsWRKY95 and GsWRKY74 displayed age- and/or HL-induced expression in roots. These nuanced regulatory patterns suggest that GsWRKYs in this study hold promising potential as candidate genes for genetic improvement of G. styracifolia with enhanced schaftoside content.