Background

In plants, the shoot apical meristem (SAM) is a domed structure at the aerial growth tip that contains stem cells that generate daughter cells that will differentiate to become above-ground organs. The balance of organ initiation and stem-cell maintenance leads to constraints in SAM size and shape [1, 2]. SAM architecture is determined in part by the genetic background, and several genes contributing to the gross morphology of the SAM have been identified through mutant screens [3]. In the world’s most-produced crop maize (Zea mays L.), several SAM-related genes have been identified in this manner; these genes show distinct SAM-specific expression patterns and act at the top of hierarchical networks comprising hundreds of genes [4,5,6,7]. Some cloned SAM-related genes are homeobox genes encoding a protein containing a typical DNA-binding domain of 60 amino acids, known as a homeodomain, which is present in dozens of transcription factors (TFs).

Homeobox proteins can be classified into 14 distinct classes in plants, encompassing well-established subfamilies such as KNOTTED1-like homeobox (KNOX), zinc-finger homeodomain (ZF-HD), WUSCHEL-related homeobox (WOX), and homeodomain leucine zipper (HD-ZIP), as well as emerging evolutionary categories such as plant homeodomain (PHD) and DNA-binding homeobox (DDT) [8]. The fundamental role of homeobox proteins lies in the control they exert over the progression of growth and development. Typically, members of the KNOX subfamily have major roles in the proper regulation of orchestrating plant growth and development, including the regulation of meristem formation and the maintenance of organ morphogenesis [9,10,11]. The HD-ZIP subfamily exhibits substantial potential in enhancing plant growth behind plant responses to environmental stressors [12, 13]. Similarly, the WOX subfamily contributes to early embryogenesis, the sustained activity of meristematic stem cells, and the development of lateral organs [14,15,16]. Research on HD-containing proteins in plants began in maize, wherein Knotted1 was first identified in controlling leaf differentiation [17, 18]. Subsequently, the functional importance of numerous genes encoding proteins with HD domains has been corroborated across diverse plant species, including Rough sheath1 (Rs1), Gnarley1 (Gn1), Narrow sheath1 (Ns1), and Ns2, among others [19, 5, 20]. Collectively, these findings underscore the immense regulatory potential of the homeobox TF family in shaping plant growth, development, and responses to environmental stimulus.

Homeobox TFs interact with the promoter regions of specific target genes. A single TF can potentially target hundreds to thousands of genes, and the collective interactions among multiple TFs and their respective targets establish a central regulatory network that governs nearly all aspects of organismal biology [21, 22]. Despite the biological significance of TF regulatory networks, there has been a persistent bottleneck in the identification of such networks. This bottleneck necessitates the collection of vast amounts of information through individual experiments, each targeting a single TF [23, 24]. In the case of plants, additional challenges arise due to the relative lack of specific antibodies that can be used for chromatin immunoprecipitation followed by sequencing (ChIP-seq) assays, as well as the recalcitrance or difficulty of crop transformation for the expression of a construct encoding an epitope-tagged version of a TF of interest [21, 25]. Recent attempts to address this issue have employed genome-wide analyses that combine single ChIP experiments with high-throughput sequencing. For example, a regulatory network consisting of 112 TFs was identified in human colorectal-cancer cells and a network comprising 104 leaf-related TFs was reconstructed in maize [26, 27]. However, it is worth noting that such studies remain relatively scarce, and elucidation of the landscape underlying regulatory networks within plant genomes remains challenging.

We previously devised a transient and simplified cleavage under targets and tagmentation method known as tsCUT&Tag, which can deduce the targets of TFs across different plant tissues and stages based on a comprehensive machine-learning model [28]. Here, we employed tsCUT&Tag, co-expression networks, and gene-regulatory networks to dissect the regulome of homeobox TFs that are strongly associated with SAM maintenance and development in maize. We constructed regulatory networks, offering a valuable resource for dissecting the intricate regulatory mechanisms conferring plant architecture in maize via homeobox TFs. We provide evidence for the pivotal role of WOX13A in regulating Gn1, ultimately influencing plant height. Our findings shed light on the regulatory patterns of SAM-related homeobox TFs and well-known SAM functional genes.

Results

SAM-related homeobox genes are enriched in KNOX, WOX, and ZF-HD subfamilies

Homeobox genes are TFs that control tissue differentiation and development [29]. Several key homeobox genes have been cloned and shown to function in the SAM in maize. To ascertain the functional relevance of these genes in SAM development, we first assembled 26 well-known SAM functional genes and described their mutant phenotypes [30,31,32,33,34,35,36,37,38,39,40] (Fig. 1A, Additional file 2: Table S1). This includes nine homeobox TFs. Of these, Kn1, Gn1, ZmBLH12, and ZmBLH14 displayed elevated expression levels within the SAM and related meristems (Additional file 1: Fig. S1) [4, 5, 17]. We leveraged an existing maize multi-omics integrative network [41] to define the direct interconnected network for these 26 genes at both the co-expression and co-translation levels (Fig. 1B, Additional file 3: Table S2). Focusing on the homeobox family within the co-translation and co-expression integrated network (co-network) of SAM functional genes, three homeobox subfamilies are either co-expressed or co-translated alongside SAM functional genes (Fig. 1C). The KNOX, WOX, and ZF-HD subfamilies have a statistically significantly higher co-expression ratio with SAM functional genes compared to randomly selected control genes (Fig. 1C), suggesting that these subfamilies are more likely to function in the SAM. Next, we established a gene-regulatory network (GRN) centered on the three-most-significantly enriched subfamilies in the co-network (Fig. 1C), namely KNOX, WOX, and ZF-HD, to scrutinize in more detail the regulatory mechanisms of SAM functional genes.

Fig. 1
figure 1

Genes co-expressed and co-translated with known SAM functional genes are statistically significantly enriched in the KNOX, ZF-HD, and WOX subfamilies. A Heatmap representation of the expression levels of known SAM functional genes in the SAM, differentiated reproductive tissues, and other mature tissues. B Co-expression and co-translation genome networks of known SAM functional genes. MaxS: max scaling. C Proportion of known SAM functional genes in co-expression and co-translation networks for different homeobox subfamilies compared to random control genes (an equal random selection of genes in all co-networks). ** represents significance, ns represents no-significance

Genome-wide binding sites of maize SAM-related TFs

Toward dissecting the molecular mechanisms by which maize SAM-related TFs shape tissue differentiation and development, we cloned full-length coding sequences of SAM-related TF genes from the KNOX, WOX, and ZF-HD homeobox subfamilies from a maize TFome (Additional file 4: Table S3) [42]. To check sequence homology between maize V3 and V5 genome-annotation releases, we aligned protein sequences for each TF. Twenty-three of the 27 studied TFs have identical V3 and V5 protein sequences. Four genes, Zm00001eb354880, Zm00001eb058930, Zm00001eb295920, and Zm00001eb217470, have minor differences between the two versions (Additional file 1: Fig. S2). We conducted tsCUT&Tag experiments by integrating CUT&Tag in B73 protoplasts, and ATAC-seq and RNA-seq across various tissues and stages [28] to catalog genomic binding sites for each TF. We successfully performed nearly 100 tsCUT&Tag experiments on three TF subfamilies. Subsequently, a standardized pipeline was used for data processing with stringent criteria for normalized strand cross-correlation coefficient (NSC) and relative strand cross-correlation coefficient (RSC), both set at > 1.05 and 1.0, respectively (Additional file 5: Table S4). The Pearson correlation coefficient for reproducibility between biological replicates was set at ≥ 0.8 (Additional file 1: Fig. S3). A dataset comprising 27 TFs, each with at least two biological replicates, was compiled. Among them, seven members belong to the KNOX subfamily and 10 each to the WOX and ZF-HD subfamilies. We delineated the genomic relationships among these homeobox TFs through a phylogenetic analysis, finding higher phylogenetic affinity within subfamilies except for the KNOX subfamily member Ns1, which is more closely related to the WOX subfamily members (Additional file 1: Fig. S4).

Following a series of stringent bioinformatics analyses and filters (see the “Methods” section), we identified a total of 30,915 reproducible TF-bound sites for these SAM-related TFs (Fig. 2A). The number of binding peaks showed considerable variability among TFs, with a median value of approximately 5002 binding sites per TF (Inter-quantile range [IQR] 1016 to 8677) (Additional file 1: Fig. S5A, Additional file 6: Table S5). The number of regulated target genes detected by tsCUT&Tag experiments ranged from 962 to 7227 per TF (Additional file 1: Fig. S5B, Additional file 6: Table S5). The genomic distribution of binding peaks showed a bias toward the promoter regions of target genes, as previously described [43] (Fig. 2B and C, P < 7.2E − 10). Most tested TFs preferentially bound sequences containing core ATTA and CCAA motifs, consistent with Arabidopsis homeobox TFs (Additional file 7: Fig. S6) [44]. With these target-gene datasets for each SAM-related homeobox TF, we then constructed a comprehensive regulome.

Fig. 2
figure 2

The regulome of maize SAM-related homeobox subfamilies. A Total number and percentage of binding peaks identified by tsCUT&Tag for KNOX, ZF-HD, and WOX subfamily members. B Metaplot showing the distribution of TF binding sites for KNOX, ZF-HD, and WOX family members tested. C Heatmap representation of tsCUT&Tag and ATAC-seq signals over ~ 3 kb of sequence upstream and downstream of the transcription start site (TSS) for the SAM-related homeobox TFs Rs1 and Ns1. Zm00001eb007470 (ZmRPH1) in the genome browser. ATAC-seq and TF binding results are shown. SAM, shoot apical meristem

We performed gene-ontology (GO) enrichment analysis on the putative target genes of using -Log10(FDR) as the enrichment standard. We saw statistically significant enrichment in functions related to transport processes, growth and development, and responses to hormones and stress (-Log10(FDR) > 2) (Additional file 7: Table S6). For example, several target genes are related to plant-hormone stimuli, including auxin, abscisic acid, gibberellin, cytokinin, and brassinosteroids. Other enriched target genes are involved in organ development and morphogenesis. Taken together, genes from the regulome of SAM-related homeobox TFs have physiological relevance and constitute a network resource for further understanding maize growth and development.

Target hubs and HOT regions are enriched for regulatory genes

Genomic regions with high TF occupancy are frequently associated with important functions [45, 46]. Here, we aggregated all TF-target-gene interactions for the 27 tested TFs. 77.2% of potential target genes are bound simultaneously by multiple TFs. As the number of binding TFs increases, the number of co-regulated target genes decreases, indicating that most TFs co-regulate a limited number of target genes (Additional file 1: Fig. S7A). Notably, the proportion of overlapping target genes regulated by TFs within subfamilies is statistically significantly higher than between subfamilies (P = 0.004, KS-test) (Additional file 1: Fig. S7B).

Hub target genes engage in crosstalk between different signaling pathways and represent important TF targets [47]. We built a random TF-target-gene distribution to delineate the hub target genes by randomizing the relationship between TFs and their potential target genes, while preserving the number of potential target genes for each TF, as previously described (Additional file 1: Fig. S8A) [48]. Based on the 99th percentile values of the randomized distributions, we defined 3,168 potential target genes without enhancer enrichment, which are bound by 13 or more TFs, as hub targets (Additional file 8: Table S7). Complementary to hub target genes, we characterized “hot” regions in the genome as regions bound by many TFs. Hot regions are distinct from hub genes in that they can be bound by many TFs at different locations. Their occupancy follows a negative exponential curve [48] (Additional file 1: Fig. S8B). A total of 3053 hot regions without enhancer enrichment were identified, defined as regions bound by 12 or more TFs (Additional file 8: Table S7). These hot regions are linked with 2773 target genes. We observed that 2,608 (~ 82.3%) of 3168 hub targets overlap with hot regions, with the remaining 17.7% of hub genes not being associated with hot regions because of the many TFs that bind to their loci target at different regions (Additional file 1: Fig. S8C).

Next, we surveyed regulatory complexity and identified a specific set of low-complexity genes, i.e. those bound by a single TF (Additional file 8: Table S7). GO-term enrichment analysis on these genes revealed their association with protein-complex biogenesis and assembly, and growth (Additional file 1: Fig. S6D). However, hub genes exhibited a pronounced enrichment within critical pathways, including signal transduction, photoprotection, defense response, and hormone-signaling pathways, underscoring their important roles in regulatory networks (Additional file 1: Fig. S8E).

We also identified several potential target genes bound by more than 20 TFs. For example, Zm00001eb007470 (ZmRPH1, encoding a microtubule-associated protein) showed a high-TF-occupancy region bound by 24 TFs (Fig. 2D). ZmRPH1 contributes to cytoskeletal architecture and its over-expression induces a dwarf phenotype [49].

KNOX, WOX, and ZF-HD subfamilies partially overlap in binding-site specificity

Some TFs function collaboratively in organisms. To further delineate TF combinations, we visualized genome-wide mapping and quantified the frequency of co-localization of all possible pairs of TFs to target genes, based on the presence of their respective binding peaks within each TF sample (Additional file 1: Fig. S9A and Additional file 1: Fig. S9B). Specifically, we considered all TF pairs that bind to the same target genes via non-overlapping motifs. Most pairs of TFs showed a low degree of co-localization for their binding sites within target genes and co-localizing TFs predominantly belong to the same subfamily.

To evaluate binding-site specificity within the three homeobox subfamilies, we performed a comparative analysis and saw that ~ 51% of all detected peaks are present for at least two subfamilies (Fig. 2A). This finding indicates that although these three subfamilies have distinct binding profiles, they also show substantial overlap in their binding sites among individual members within each subfamily. We performed motif analysis using the upstream and downstream regulatory regions of peak summits bound by KNOX-only, WOX-only, ZF-HD-only members, or bound by all subfamilies referred to as shared peaks. Shared-peak subsets were enriched for the core motifs ATTA and CCAA. However, WOX-only and KNOX-only peaks only retuned one over-represented motif, CCAA. Notably, we detected ATTA-enriched motifs in ZF-HD-only peaks (Fig. 3A).

Fig. 3
figure 3

Binding specificity of KNOX, ZF-HD and WOX subfamily TFs to unique and shared peaks in their target genes. A Sequence logos for the top motifs identified for sites bound by WOX-only, ZF-HD-only, KNOX-only TFs, or peaks shared among them. Values represent the E-value of each motif. B Venn diagram of target genes overlapping among KNOX, ZF-HD, and WOX subfamilies. C Functional enrichment of WOX-only, ZF-HD-only, KNOX-only, or shared target genes determined by GO analysis. SN, mean SumNormalizer

Although different TFs within the same subfamily shared more target genes (53% average) than different TFs across different subfamilies (49% average; P = 0.004, Additional file 1: Fig. S7B), we observed that at least 74–89% of all target genes from one of the three TF subfamilies were also among the targets of at least one of the other two subfamilies (Fig. 3B). This indicates that the target genes specifically regulated by each subfamily accounted for only 10.7–26.4% of the total regulated by that subfamily. We then tested whether the four categories of target genes (Fig. 3C) were associated with unique functions. GO-term enrichment analysis revealed that genes co-occupied by all members of all three TF homeobox families are statistically significantly enriched in terms related to “plant-organ development,” “flower development,” “leaf development,” “response to hormone,” “fruit development,” and “signal transduction” (Fig. 3C). However, genes exclusively targeted by KNOX-type TFs (KNOX-only) were enriched for terms associated with “plant-type cell wall,” and WOX-only targets were enriched for “chloroplast” and “meristem maintenance.” We detected no statistically significant enrichment for GO terms among high-confidence genes exclusively targeted by ZF-HD-specific TFs (Fig. 3C). Collectively, these results underscore the propensity of KNOX, WOX, and ZF-HD subfamily TFs to frequently co-regulate the same locus.

Functional differentiation of SAM-related TFs

Gene duplication is a fundamental mechanism underpinning the functional diversification of paralogous genes resulting from whole-genome duplication (WGD), tandem duplication (TD), duplication mediated by transposition (TRD), proximal duplication (PD), and dispersed duplication (DSD) [50, 51]. To enhance the construction of the regulome, we additionally gathered published ChIP-seq data for three subfamilies, namely Kn1 of KNOX, for analysis [51]. We identified 14 duplicated gene pairs, which we classified as having arisen from WGD (three pairs), PD (one pair), TRD (five pairs), and DSD (five pairs) (Additional file 9: Table S8) [51]. These duplications predominantly arose during the divergence of the Magnoliophyta, Petrosaviidae, Commelinids, and Poaceae and ultimately the speciation of Z. mays (Additional file 1: Fig. S10A and Additional file 1: Fig. S10B, Additional file 9: Table S8).

To probe the functional fates that genes may assume after duplication events, we examined the temporal and spatial expression patterns of duplicated genes across maize development by RNA-seq. We identified two duplicate pairs showing the same overall pattern of expression (i.e., conserved duplications), with the remaining two pairs showing distinct expression patterns (i.e., divergent duplications), with ~ 70% of the duplications showing partial expression divergence (Additional file 9: Table S8). Protein-sequence divergence might also lead to functional differentiation. To explore regulome divergence between duplicated gene pairs, we introduce an index—divergence score (DS) [52] to assess differentiation between two regulatory networks. Notably, as protein-sequence similarity increased, new regulatory networks emerged (Fig. 4A, R = 0.56, P = 0.01). The randomized DS displays a normal distribution (Fig. S11). Consequently, we categorized duplications as conserved, partially diverged, or diverged, with a 95% threshold (Fig. 4B, Additional file 9: Table S8). The duplicated pair Zhd1 and Zhd21 is conserved at the regulome level, indicating that these TFs regulate the expression of the same genes. By contrast, four duplicate pairs, such as Kn1 and Lg3, showed divergence in the cohort of genes that each encoded TF regulates, and the remaining pairs (11/14) fell in between, leading us to define them as partially diverged (Fig. 4B).

Fig. 4
figure 4

Conservation and divergence in networks among duplicated genes. A Increased protein-sequence similarity is associated with greater similarity in gene-regulatory networks (GRN) generated by tsCUT&Tag. B Network rewiring between duplicate pairs of TFs may remain unchanged after duplication (conserved), involve both common and unique connections (partly diverged), or exhibit few common connections (diverged). Examples from each scenario in the classes of duplicates are shown at right. Nodes are denoted as 1:1000. C Pattern of conservation or divergence between transcriptome and regulome data for the same duplicate pairs of TFs. D Differential expression analysis based on RNA-seq between wild type and the kn1 mutant. E GO terms of target genes regulated by Kn1 and Lg3

By integrating the differentiation pattern observed at the transcriptome and regulome levels, we noticed one divergent duplication with the same behavior across these two regulatory levels (Fig. 4C). Kn1 and Liguleless3 (Lg3) (a TRD), showed functional differentiation and bear distinct cis-elements within their promoter regions and have different gene structures, resulting in a diversification of combinatorial regulation (Additional file 1: Fig. S12, A and B). The duplicate genes also differed in their expression patterns, with Kn1 more highly expressed than others in differentiated tissues, but similar in mature tissues (Fig. 12C). Additionally, analysis of Kn1 target genes and differentially expressed genes (DEGs) in the kn1 mutant revealed that Kn1 targets Lg3 and up-regulates its expression (Fig. 4D) [17]. Functional enrichment analysis of target genes indicated that both Kn1 and Lg3 exhibit a broad spectrum of hormone-related regulatory functions (Fig. 4E). However, Kn1 targets specifically participate in regulation of circadian rhythm, vegetative phase change, and growth and development of flowers and leaves (Fig. 4E). Meanwhile, Lg3 targets are involved in regulation of leaf senescence (Fig. 4E). These findings align with previous findings that Kn1 primarily functions in regulating the differentiation of SAM into cell groups with distinct functions, and in growth and development of lateral organs. Moreover, Kn1 often interacts with Lg3 and Lg4 to jointly regulate leaf development [51]. We conclude that Kn1 duplication that gave rise to Lg3 led to the subfunctionalization of fate differentiation. This example revealed that divergence in the connectivity of the regulome between duplicated genes is accompanied by divergence in expression, hinting at functional divergence.

Tissue- or stage-dynamic networks of SAM-related TFs

Plant development and phenotypic variation are governed by precise and multifaceted GRNs [41, 53, 54]. Here, we constructed a GRN using information integrated from all target genes identified by tsCUT&Tag, co-expression, and co-translation [41]. This network contained 235,043 edges and 18,463 nodes, providing a substantial resource to facilitate the functional dissection of maize SAM-related homeobox TFs.

In a typical network, the in-degree of nodes represents the number of genes that can be regulated or interact with a given node. We evaluated the distribution of in-degree values, which showed the characteristics of a scale-free network for the SAM-related homeobox TFs (Additional file 1: Fig. S13A). Moreover, converting in-degree values to their log2 values resulted in a pattern consistent with a power-law distribution (R2 = 0.87) (Additional file 1: Fig. S13B). Hot nodes within the network had more connectivity than others, signifying their importance in shaping information-flow pathways. We selected the top 10% of nodes with the highest degree of connectivity, corresponding to 1844 genes; of these, ~ 45% (822 genes) were also hub targets bound by 13 TFs or more, as determined by our tsCUT&Tag experiments (Additional file 1: Fig. S13C, Additional file 10: Table S9). GO enrichment analysis of these genes underscored their involvement in physiology, including “signal transduction,” “response to hormone,” “organ senescence,” and “defense response,” indicative of the important roles these nodes play within the network (Additional file 1: Fig. S13D).

To further dissect the functional roles of SAM-related homeobox TFs, we gathered comprehensive RNA-seq, ribosome profiling (Ribo-seq), and ATAC-seq datasets collected from various B73 tissues: leaves, SAM, internodes, stems, tassels and endosperm (Additional file 11: Table S10). We defined 18,242–26,493 genes across different tissues or stages as being expressed, with a minimum expression level of transcripts (TPM > 1) (Additional file 1: Fig. S14A, Additional file 11: Table S10). We also detected between ~ 50% and ~ 80% of the target genes identified by tsCUT&Tag as being expressed in a specific tissue or stage (Additional file 1: Fig. S14B). ATAC-seq unveiled 4187–27,145 accessible-chromatin regions spanning tissue stages. Between these two datasets, 10–74% of the target genes identified by tsCUT&Tag overlap with ATAC-seq open regions in specific tissues or stages. Within accessible-chromatin regions across different tissues, each TF regulates 521–5353 target genes (Additional file 1: Fig. S14C and Additional file 1: S14D). The transcriptome GRN and translatome GRN also showed a substantial overlap with the target genes for each TF, ranging from 0.3–27% and 0–30% of target genes as detected by tsCUT&Tag (Additional file 1: Fig. S14E and Additional file 1: Fig. S14F).

To construct tissue-specific dynamic regulatory networks, we first designed a machine-learning model using ATAC-seq and tsCUT&Tag data from etiolated seedlings. The accuracy of the multi-layer perceptron models for the 27 TFs ranged from 0.75 to 0.96 (Additional file 12: Table S11). The constructed models were then used to predict TF binding sites in the SAM based on ATAC-seq data. We predicted hundreds of TF binding sites in the SAM (Additional file 13: Table S12). These binding sites were used to build the SAM dynamic regulatory network.

We constructed SAM dynamic networks by integrating multiple datasets. To acquire high-confidence TF-DNA regulatory associations, we required the target genes predicted by tsCUT&Tag and ATAC-seq to coexist in both the transcriptome and translatome GRNs. Simultaneously, we required the target genes expressed in the SAM (TPM > 1). Ultimately, we constructed a dynamic SAM network (Fig. 5A, Additional file 14: Table S13). It contained several known functional genes that influence plant architecture, flowering, and grain development, such as BEL1-like homeodomain12 (ZmBLH12), BEL1-like homeodomain12 (ZmBLH14), and Fasciated Ear4 (Fea4) (Fig. 5A). GO enrichment analysis demonstrated that the SAM dynamic network is predominantly enriched for genes related to “meristem development,” “meristem initiation,” “meristem maintenance,” “shoot system development,” “hormone-mediated signaling pathway” (gibberellin, ethylene, and auxin), “tissue development” (flower, fruit, seed, leaves), and cell differentiation (Fig. 5B). Together, these findings highlight the biological significance inherent in SAM dynamic networks, providing insights into the functional characteristics of hitherto unclassified genes.

Fig. 5
figure 5

SAM dynamic regulatory landscape of maize TFs. AB The dynamic regulatory network in the SAM (A) and their functional annotation by GO enrichment (B). RichFactor represents the ratio of the input gene number in the pathway to the background gene number in the pathway

Networks of SAM-related TFs associated with plant height

Plant height is an important agronomic trait with a complex regulatory mechanism [55]. We collated a list of published functional genes related to maize plant height and divided them into more than 12 distinct modules encompassing hormone-signaling pathways (gibberellin, auxin, brassinosteroid, cytokinin, ethylene, abscisic acid, and strigolactone), phytochrome signaling, as well as pathways involved in vegetative and reproductive development, SAM maintenance and determinacy, microtubule cellulose, sugar metabolism, and nutrition. We then constructed a plant-height functional network selected by an integrated network based on this list of ~ 78 known functional genes and TF genes (Fig. 6A, Additional file 15: Table S14). An example of the resulting network, namely for Knox6, illustrates how homeobox TFs exert their influence on plant height through a multitude of distinct pathways (Fig. 6B). Each functional gene appeared to be affected by multiple homeobox TFs, as shown for Vanishing tassel2 (Vt2), demonstrating the complexity of the molecular network governing plant height (Fig. 6C). As an example, Vt2 is involved in auxin biosynthesis, this gene is simultaneously regulated by KNOX members Knox6 and Hb20 (Fig. 7D). Moreover, Knox6 also bound to the Hb20 locus, indicating that one KNOX TF can regulate another KNOX gene (Fig. 7D). To validate these regulatory relationships, we conducted firefly luciferase (LUC) reporter assays in maize protoplasts. We cloned 2-kb Vt2 and Hb20 promoter regions upstream of the LUC reporter gene and used Knox6 and Hb20 as effectors. Knox6 and Hb20 repressed the transcription of the Vt2, Knox6 activated the Hb20 (Fig. 6E and F), indicating that Knox6 and Hb20 have regulatory roles in auxin biosynthesis, thereby influencing plant height [56].

Fig. 6
figure 6

Network of known plant-height functional genes and TFs in maize. A Regulatory network showing the connection between SAM-related TFs and known plant-height functional genes, with different colors indicating the various pathways depicted to the right. B Knox6 regulates diverse biological pathways. Color-coded dots correspond to categories shown in A. CVt2, a functional target gene, is regulated by multiple TFs. Color-coded dots are the same as A. D Genome browser view of the association of Knox6 and Hb20 with the Vt2 and Hb20 promoters. E, F Dual-luciferase reporter assay (E) and diagram of the proposed module (F) indicating that transcription from the Vt2 promoter is repressed by Knox6 and Hb20, whereas transcription from the Hb20 promoter is induced by Knox6. “Treatment” refers to the co-transformation of the pM999 vector containing the TF effector and the p0800 vector containing the 2-kb promoter region of the target gene. “Control” refers to the co-transformation of the empty pM999 vector with the p0800 vector containing the 2-kb promoter

Fig. 7
figure 7

WOX13A regulates the expression of Gn1 and affects maize architecture. AWOX13A expression across different tissues and stages. B Strategy for the CRISPR–Cas9-based isolation of a WOX13A knockout mutant. C Differential expression analysis based on RNA-seq of V6-stage internodes from wild type and wox13a mutant. The position of Gn1 is annotated. D Venn diagram showing the overlap between target genes identified by tsCUT&Tag and DEGs obtained by RNA-seq. E Dual-luciferase reporter assay showing Gn1 transcription is induced by WOX13A. “Treatment” refers to the co-transformation of the pM999 vector containing the TF effector and the p0800 vector containing the 2-kb promoter region of the target gene. “Control” refers to the co-transformation of the empty pM999 vector with the p0800 vector containing the 2-kb promoter. F Representative phenotypes wild-type KN5585 and the wox13a mutant. Scale bar: 20 cm. G Plant height and ear height of KN5585 and wox13a plants grown in Shandong and Hubei, China. Statistical significance was determined by Student’s t test. **, P < 0.01; ***, P < 0.001

Loss of ZmWOX13A function contributes to plant-height variation in maize

To test the functional hypothesis of genes inferred from the network, we selected WOX13A, which was most highly expressed in internodes and was highly connected in the plant-height transcriptional GRN, suggesting that it might play a role in determining plant height (Fig. 7A). We used CRISPR-Cas9-mediated genome editing to generate a loss-of-function wox13a mutant carrying a 1-bp deletion from the second exon, causing a frameshift and leading to premature termination of translation (Fig. 7B). We performed RNA-seq on internode samples at the V6 stage on the wild type (KN5585) and wox13a mutant, finding 490 DEGs (P < 0.05 and absolute Log2(FC) > 1) (Fig. 7C, Additional file 16: Table S15). We also defined 5839 genes as direct WOX13A targets by tsCUT&Tag; the overlap between the two gene lists identified 74 genes as WOX13A-regulated genes (Fig. 7C and D). Gn1 was among these — a well-known SAM-related and plant-height gene (Fig. 7B and D). By transient luciferase-reporter assay using a Gn1pro:LUC reporter and a WOX13A effector, we confirmed WOX13A activates the Gn1 expression (Fig. 7E). To rigorously examine wox13a developmental defects, we grew wild-type KN558 and wox13a mutants in the field at two locations in China over two locations. Plant and ear height in wox13a mutants were consistently statistically significantly shorter than KN5585 controls (Fig. 7F and G). These findings strongly suggest that WOX13A induces Gn1 expression, consequently influencing the overall plant architecture of maize.

Discussion

Attaining optimal plant architecture in maize represents a critical determinant for achieving high yields. The establishment of plant architecture depends largely on the development of the SAM [57, 58]. In maize, some of the cloned SAM-related functional genes belong to the homeobox-protein family, underscoring their role in plant architecture. Toward identifying the molecular networks governing maize architecture, we generated a comprehensive molecular regulatory network and dynamic regulatory networks for key plant-architecture tissues by focusing on 27 SAM homeobox TFs. We compared the generated homeobox integrated network resource with the genes influenced by known SAM functional genes in maize. Our study found that 84.3% (4758/5647) of target genes identified by FEA4 ChIP-seq were present in our homeobox integrated network (Additional file 1: Fig. S15A) [59]. Additionally, 41.7% of the nodes in our generated SAM dynamic network overlapped with FEA4 target genes (Additional file 1: Fig. S15B). Approximately 62.6% of the differentially expressed genes between the grx triple mutant and wild type were also found in the homeobox integrated network (Additional file 1: Fig. S15C) [38]. Similarly, about 58% of DEGs between the ub2/ub3 double mutant tassel and wild-type tassel at the V5 stage were present in the homeobox integrated network (Additional file 1: Fig. S15D) [60]. These findings demonstrate that the targets affected by known SAM functional genes are largely present in the generated homeobox integrated network, validating its biological significance. Furthermore, the homeobox integrated network includes a larger number of network nodes and potential functions, providing an invaluable resource and foundation for in-depth exploration into the multifaceted functions of homeobox TFs within the context of maize growth and development.

tsCUT&Tag offers advantages such as a straightforward experimental procedure, time and cost efficiency, and high throughput. However, it still has some limitations in plant epigenetic studies. Enzymatic digestion of the plant cell wall is a stress treatment that may affect the regulatory networks of certain TFs, especially those related to chemical stimuli [49]. Additionally, tsCUT&Tag is reliant on the availability of high-quality protoplasts. Obtaining such protoplasts can be challenging due to the spatiotemporal specificity of TFs [61]. Integrating data from techniques like ATAC-seq and RNA-seq from specific tissues can aid in constructing the tissue regulatory landscape of TFs and enhance accuracy in analyzing tissue dynamic networks [28]. Although tsCUT&Tag uses a machine-learning model based on the ATAC-seq and RNA-seq data and could predict the targets of TFs across different tissues and stages with an accuracy of ~ 70% [28], we still need to be cautious since there are some false positives.

Gene duplications constitute a fundamental mechanism underlying the expansion of gene families and the diversification of gene functions, thereby serving as a significant catalyst for genome evolution. A deep understanding of DNA-binding specificity among distinct TF subfamily members is crucial for characterizing genetic redundancy and diversity, functional differentiation of duplications, and more. In this study, we established that members originating from the same subfamily exhibit only relatively minor differences in binding site specificity and target genes, indicating a substantial degree of functional redundancy within individual homeobox TF families. Furthermore, we saw contrasting fates of duplications to SAM-related homeobox genes at the transcriptome and regulome levels. Kn1 and Lg3 are an example of a divergent duplication, with Kn1 regulating Lg3 expression to ensure proper inflorescence growth, indicative of subfunctionalization.

Dissecting regulatory networks is an efficient method for exploring highly dynamic and complex principles of functional genomics [62]. Here, we curated an expansive regulatory network encompassing 27 homeobox TFs, comprising 235,043 edges and 18,463 nodes, reflecting a complexity and redundancy of regulation and supervision that is consistent with performance in animals [63,64,65]. Compared to other investigations around homeobox functional genes [17, 19, 20], the GRNs we identified encapsulate a comprehensive spectrum of genetic-information flow across diverse pathways. Notably, all TFs investigated here regulate at least three distinct biological pathways, and at least two or more TFs converge to control genes within the same pathway, collectively revealing a multifaceted mode of transcriptional coordination.

Rational deployment of favorable gene combinations may potentially contribute to functional enhancements. Our understanding of TFs remains somewhat limited in the context of integrated networks, prompting us to integrate additional datasets to create dynamic regulatory networks across various maize growth stages and tissues. This network within distinct tissues harbors distinct functional pathways, comprised both known functional genes and those with unknown functions. Such networks aid in the rapid discovery of functional genes and their regulatory relationships. Exploring the functions of target genes is a valuable tool for the prediction of the biological roles associated with individual TFs. Through empirical validation, we demonstrated that Knox6 and Hb20 repress the Vt2 transcription, a key functional gene involved in auxin biosynthesis. This empirical confirmation reinforces a model wherein Knox6 and Hb20 participate in the regulation of auxin biosynthesis. We generated a wox13a mutant via gene editing, which demonstrated that WOX13A influences plant height through its regulation of the functional gene Gn1. These findings collectively underscore the powerful predictive capabilities of GRNs in delineating the functions of TFs and scoping the complex regulatory interplay among TFs themselves, as well as between TFs and their target genes, within diverse biological pathways.

Conclusions

In summary, we generated a comprehensive dynamic regulome of SAM-related homeobox genes with tsCUT&Tag data of maize homeobox genes. By reproducing the regulatory associations between plant-height functional genes and SAM-related homeobox genes, our findings contribute to the identification and characterization of regulatory mechanisms governing plant architecture. We identify a novel role for Wox13A in control of plant height, further assisting rational improvement of agronomic traits.

Methods

Plant materials and vector construction

Seeds of maize (Zea mays inbred line B73) were planted in a growth chamber in controlled conditions (25 °C, in continuous darkness) for approximately 10 days. The secondary yellow leaves were used for protoplast isolation and tsCUT&Tag. B73 seeds were also planted in a greenhouse under local environmental conditions in Wuhan. Samples from the shoot apical meristem of V4-stage seedlings, as well as the internode and stem of V5-stage seedlings, were collected for ATAC-seq. KN5585 and the wox13a mutant were planted in a greenhouse, also under local environmental conditions. Internode samples from V6-stage plants were harvested for RNA-seq. KN5585 and wox13a were cultivated for phenotypic observation in Zibo, Shandong during the spring of 2022, and in both Zibo, Shandong, and Wuhan, Hubei (China) during the spring of 2023.

Thirty-two maize vectors containing genes listed in Additional file 4: Table S3 were amplified from the maize TFome using primers TF-F and TF-R, as previously described [42]. Full-length coding sequences were cloned into linearized pM999-GFP at the XbaI restriction site through homologous recombination. Transformation-grade plasmids were then prepared according to previously described methods [54].

Protoplast isolation, transformation, and tsCUT&Tag assay

Isolate and transformation of maize protoplasts was done as previously described [28]. For CUT&Tag assays, the tsCUT&Tag procedure [28] was followed, based on CUT&Tag procedure with adjustments [66]. The CUT&Tag assays were conducted using a commercial kit (Vazyme, In-Situ ChIP Library Prep Kit for Illumina, TD901-TD902). The TF–target DNA libraries were constructed with distinct indexes (Vazyme, TruePrep™ Index Kit V4 for Illumina, TD204) and sequenced on an Illumina NovaSeq platform at Annoroad (Beijing, China).

ATAC-seq

The construction of ATAC-seq libraries for the SAM, stem and internode samples was performed following a previously established protocol with some modifications [67, 68]. In brief, native nuclei were extracted and pelleted by centrifugation and then resuspended in 20 μL 1 × TTBL buffer (VAHTS, TD501). For each library, ~ 10,000 nuclei were treated with Tn5 (VAHTS, TD501) in the presence of 0.3% v/v Triton X-100 at 37 °C for 30 min. Samples were purified using a Qiagen MinElute kit and then amplified to construct a library with a TruePrep DNA Library Prep Kit V2 for Illumina sequencing (TD501, Vazyme). The resulting amplicons underwent size selection using DNA clean beads (KAPA) and were prepared for 150-bp paired-end sequencing.

Analysis of TF peaks and open chromatin

Clean reads were aligned to the B73 maize reference genome version 5.0 using Bowtie2 software [69,70,71]. Aligned reads were filtered to retain those with mapping quality (MAPQ) > 30 and exclude reads mapping to multiple locations in the genome by retaining AS:I: < N > tag and removing the XS:i: < N > tag. We used BEDTools to eliminate reads at the TF site to mitigate vector contamination [72]. SAMtools rmdup was applied to discard duplicated sequences, yielding the BAM file for subsequent analysis [73]. TF peak calling was performed using MACS2 (version 2.1.1) with control background subtraction and specifying the genome size (-g 2.1e + 9) [74]. Macs2 peaks are re-sized to ~ 150 bp around the summit and aggregate signal values. The merged-peak calling criteria employed a 1% (0.01) irreproducible discovery rate (IDR) threshold. Open-chromatin regions were identified with a cutoff P value of 0.00001 by MACS2-call-summit [74]. Target genes were defined as the closest gene containing a peak summit within 2-kb upstream and downstream of the TSS and annotated via ChIPseeker [75]. Cross-correlation analysis was conducted using phantompeakqualtools, yielding two metrics, normalized strand cross-correlation coefficent (NSC) and relative strand cross-correlation coefficient (RSC) [76]. Conversion of.bam files to.bigwig files was carried out using deepTools bamCoverage, applying reads-per-genome coverage (RPGC) normalization [77]. Heatmaps were generated using deepTools computeMatrix, plotHeatmap, and plotProfile [77]. Motif analysis was conducted using MEME-chip with the following parameters: -meme-mod anr -meme-minw 4 -meme-maxw 15 -meme-nmotifs 10 -meme-p 8. Input fasta sequence files for MEME-chip were generated by extracting 50 bp of sequence upstream and downstream of the peak summit using BEDtools and getfasta [72, 78]. To evaluate the reproducibility between biological replicates, Pearson’s correlation coefficients were calculated using deepTools (version 3.2.0) multiBamSummary. Biological replicates with Pearson’s correlation coefficient ≥ 0.8 were retained for further analyses.

Identification of hub genes and hot regions

Hub targets were defined as described in previous studies [46, 79]. The identification of hub-target genes followed the protocol described in their respective procedures. Hub genes were defined as genes identified by more TFs than the 99th percentile of the maximal value in 100 randomizations of the columns in the TF-to-gene matrix, while preserving the total number of TF targets in each randomization. Similarly, the identification of hot regions followed the same strategy as that used for hub genes. Specifically, hub genes were defined as genes targeted by more than 13 TFs, while hot regions were characterized as those being bound by more than 12 TFs. Hotspot regions were screened using candidate enhancers identified previously [80].

RNA-seq and differential expression analysis

To generate an expression map throughout the growth period, RNA-seq data were compiled from multiple sources [41] [81]. RNA-seq data obtained from the SAM 10 functional domains were included [82]. Ribo-seq data from 16 tissues or stages were used as reported by [41]. Internode samples were collected in triplicate from KN5585 and the wox13a mutant at the V6 stage. Total RNA was extracted using a Direct-zol RNA Microprep kit (Zymo Research) following the manufacturer’s instructions. RNA-seq libraries were prepared and sequenced as 150-bp paired-end reads at BGI (Shenzhen, China), through BGISEQ-500RS.

For all RNA-seq analyses, RSEM [83] was used to align the data to the B73 reference genome version 5.0 and estimate gene-expression levels. Genes differentially expressed between WT and mutant were identified with DESeq2 [84] with the following parameters: P value < 0.05, absolute Log2(Fold-Change) > 1. GO-term enrichment analysis was conducted using singular-enrichment analysis with agriGO (version 2.0, http://systemsbiology.cau.edu.cn/agriGOv2/), with the threshold set at FDR < 0.01 [85].

TF co-regulation

The degree of colocalization between TFs was quantified within each sample. The step in deciphering the combinatorial regulatory code is to identify the location of TFs. Initially, a heatmap was employed to illustrate the enrichment level of genome-wide peak regions for the 27 TFs using R. For the coregulatory matrix, the TFs were clustered based on the Jaccard distance (1—Jaccard Index) between their peak sets using average-linkage hierarchical clustering based on R.

Evaluation of gene divergence

Duplications were selected from three subfamilies derived from the download genome duplications reported by [51]. To predict cis-acting elements within the promoter regions, 1-kb regions upstream of the transcription start site were extacted for each gene using BEDtools, followed by analysis with PlantCARE [86]. To investigate the variation in spatiotemporal expression of duplicated genes, the ‘Scipy’ module in Python was employed to compute the Pearson’s correlation coefficient (R) between pairs of duplicated genes across different tissues. For a more in-depth exploration of regulatory divergence in duplicated genes, the Divergence score (DS) index was introduced, defined by the formula reported in \(1-\frac{\text{Number of shared genes between the two TFs targets}}{\text{Total number of genes in the two TFs}}\) [52].

Additionally, 100 gene pairs were randomly generated in TFs, from which the DS values were calculated, the DS value perform normal distribution based on MASS and lattice R packages. The 95th percentile of the DS-value distribution was selected as the threshold for statistically significant regulatory differentiation among duplicated gene pairs, with DS values indicating conserved (> 0.863), divergent (< 0.515), or intermediate levels of genetic differentiation (0.515 to 0.863) between duplicated genes.

Inference of an integrated transcription-regulatory network and tissue-dynamic networks

The integrated transcription regulatory network combines TF–target genes identified through tsCUT&Tag and published Kn1 ChIP and gene pairs identified, and co-expression and co-translation gene pairs from omics databases [17, 41]. Regulatory weight is obtained by normalizing the signal value on Min–Max Scaling. Conformance to a power-law distribution in the network was verified using Python.

To predict the dynamic binding sites of each TF in different tissues, we used PyTorch to construct a simple Multi-layer Perceptron (MLP) model to perform supervised learning on the tsCUT&Tag and ATAC-seq data. We trimmed the peaks of tsCUT&Tag and ATAC-seq to 100-bp upstream and downstream of the peak summit. The training data were generated by overlapping peak sequences between tsCUT&Tag and ATAC-seq. Peaks in the yellow leaves’ open chromatin were labeled as 1, and open-chromatin sequences without TF binding were labeled as 0, allowing the model to learn the features of sequence binding. These labeled sequences were then randomly divided into training and test sets. The trained model was then used to predict TF binding sites by analyzing the ATAC-seq peaks from the SAM.

To construct tissue-specific dynamic regulatory networks, expression levels were integrated from various tissue samples. For the SAM, expression levels were integrated from the 10 SAM domains, as well as the SAMs of V1- and V3-stage seedlings. Using a machine-learning model to predict each TF binding site in SAM, we required that the predicted target genes also show an overall expression level above the threshold TPM > 1. Additionally, these genes must overlap with the GRN generated by GENIE3 using gene-expression and translation levels as previously described [42, 73]. The regulatory weight is the sum of the weights identified by tsCUT&Tag and the weights predicted by the transcriptome and translatome GRNs.

Transient luciferase-reporter assays

Transient-expression assays were conducted using maize leaf protoplasts [87]. The promoters of Hb20, Vt2, and Gn1 were cloned upstream of firefly luciferase (LUC) via the KpnI and PstI sites in pGreenII 0800-LUC through homologous recombination. The full-length coding sequences of Knox6, Hb20, and WOX13A were cloned into the pGreenII PM999 vector using a homologous combination. Plasmids were mixed in appropriate combinations and transformed into protoplasts. After 14 h of incubation in darkness, protoplasts were collected and processed using a Dual-Luciferase Reporter Gene Assay Kit (DL101, Vazyme) following the manufacturer’s recommended protocol. LUC and REN activity were measured using a microplate reader. “Treatment” refers to the co-transformation of the pM999 vector containing the TF effector and the p0800 vector containing the 2-kb promoter region of the target gene. “Control” refers to the co-transformation of the empty pM999 vector with the p0800 vector containing the 2-kb promoter.

....