PISTILLATA paralogs in Tarenaya hassleriana have diverged in interaction specificity
Floral organs are specified by MADS-domain transcription factors that act in a combinatorial manner, as summarized in the (A)BCE model. However, this evolutionarily conserved model is in contrast to a remarkable amount of morphological diversity in flowers. One of the mechanisms suggested to contribute to this diversity is duplication of floral MADS-domain transcription factors. Although gene duplication is often followed by loss of one of the copies, sometimes both copies are retained. If both copies are retained they will initially be redundant, providing freedom for one of the paralogs to change function. Here, we examine the evolutionary fate and functional consequences of a transposition event at the base of the Brassicales that resulted in the duplication of the floral regulator PISTILLATA (PI), using Tarenaya hassleriana (Cleomaceae) as a model system.
The transposition of a genomic region containing a PI gene led to two paralogs which are located at different positions in the genome. The original PI copy is syntenic in position with most angiosperms, whereas the transposed copy is syntenic with the PI genes in Brassicaceae. The two PI paralogs of T. hassleriana have very similar expression patterns. However, they may have diverged in function, as only one of these PI proteins was able to act heterologously in the first whorl of A. thaliana flowers. We also observed differences in protein complex formation between the two paralogs, and the two paralogs exhibit subtle differences in DNA-binding specificity. Sequence analysis indicates that most of the protein sequence divergence between the two T. hassleriana paralogs emerged in a common ancestor of the Cleomaceae and the Brassicaceae.
We found that the PI paralogs in T. hassleriana have similar expression patterns, but may have diverged at the level of protein function. Data suggest that most protein sequence divergence occurred rapidly, prior to the origin of the Brassicaceae and Cleomaceae. It is tempting to speculate that the interaction specificities of the Brassicaceae-specific PI proteins are different compared to the PI found in other angiosperms. This could lead to PI regulating partly different genes in the Brassicaceae, and ultimately might result in change floral in morphology.
KeywordsPISTILLATA Flower development Gene duplications Paralogs MADS Tarenaya Cleomaceae
Gene duplication can give rise to evolutionary novelty. Selection pressure is temporarily weaker after a duplication event, allowing one or both of the duplicates to evolve in function. Following a gene duplication event, there are several scenarios for the fate of the newly obtained paralogs. Often, one of the paralogs is quickly lost . In case that both paralogs are retained, they might either divide the original function between the two paralogs (subfunctionalization) and/or obtain new functions (neofunctionalization) [2, 3]. Different molecular mechanisms have been proposed to explain how this is achieved [4, 5]. These different mechanisms are not mutually exclusive, and often several mechanisms act on the two paralogs simultaneously or consecutively [6, 7].
In plants, whole genome duplication (WGD) is a common phenomenon, and all angiosperms have undergone at least one WGD [8, 9]. WGDs are implied as a driving force behind the dramatic increase in the number of plant species [10, 11]. Crucially, several key innovations, such as seeds and flowers, coincided with WGDs [12, 13, 14]. Interestingly, gene loss after WGDs is not uniform, with some classes of genes being preferentially retained, among which are genes encoding transcription factors (TFs) [5, 15, 16]. One example is the family of MADS-domain TFs [17, 18]. Members of this TF family are involved in virtually all stages of plant development  and are well-known for their crucial roles in flower development . They specify the identities of the four different floral organ types in a combinatorial manner according to the (A)BCE model [20, 21, 22]. Mechanistically, these TFs achieve this by binding to the promoters of their target genes as organ-specific tetrameric protein complexes, as proposed in the floral quartet model .
Based on mutant phenotypes and expression patterns, the functions of A-, B-, C- and E-class proteins are largely conserved throughout the angiosperms . However, many plant lineages retained multiple copies of these genes after duplication events [24, 25, 26]. The first floral MADS-box gene paralogs studied in detail were the Antirrhinum majus genes PLENA (PLE) and FARINELLI (FAR). Whereas PLE is the C-function gene in Antirrhinum, mutations in the closely related FAR gene, surprisingly, only resulted in plants with partial male sterility [27, 28]. FAR is partially redundant with PLE. However, mutations in these genes result in different mutant plant phenotypes, and the proteins exhibited different capabilities to homeotically specify floral organs, caused by differences in protein-protein interactions . These data suggest that PLE and FAR have subfunctionalized . Arabidopsis also retained paralogous pairs from the C-, as well as the A- and E-classes of MADS-box genes. The paralogous gene pairs show different degrees of divergence. The paralogs of the C-class gene AGAMOUS (AG), the SHATTERPROOFs (SHP1 and 2), are not involved in floral organ specification, but play a role in carpel and fruit development . In contrast, the four SEPALLATA paralogs (SEP1–4, E-class) act in a largely redundant manner [31, 32].
The B-function is fulfilled by two genes in eudicot model species - APETALA3 (AP3) and PISTILLATA (PI) in Arabidopsis thaliana [33, 34], and DEFICIENS (DEF) and GLOBOSA (GLO) in Antirrhinum [35, 36]. The AP3/DEF and PI/GLO gene lineages resulted from a duplication before the origin of the angiosperms. Within angiosperms, both AP3- and PI-lineages underwent additional duplications, for example close to the origin of core eudicots . Although A. thaliana has only one copy of each B-class gene lineage, other plant species have retained paralogs after these older duplication events [24, 37, 38]. For example, all Solanaceae species have two AP3-like genes (AP3 and TM6), as well as two GLO paralogs of more recent origin. These paralogs have subfunctionalized in a partly species-specific manner [39, 40, 41]. A similar pattern is seen in basal asterids, where a duplication led to two PI paralogs that show species-specific differences in expression patterns .
B-class genes do not only specify petal and stamen identity but can also be involved in determining the morphology of these organs. For instance, in Petunia hybrida a PI paralog is required for the fusion of stamens to the corolla tube . Another example of involvement of B-class genes in morphology is provided by orchids. Orchids possess a perianth that consists of three morphologically distinct types of tepals, and it has been shown that these different tepal morphologies are specified by different combinations of the three to four DEF paralogs that are present in orchids [42, 43]. B-class genes might even be able to specify novel floral organ types that are only observed in one species or genus. An example is presented by Aquilegia, a basal eudicot that displays an additional type of organ in a whorl between the stamens and carpels, called the staminodium. The specification of this new organ in Aquilegia is linked to duplications in the AP3 lineage [44, 45].
The position of a gene within the genome can be biologically relevant, as genes are dependent on their genomic context for expression. Gene expression is regulated by cis-regulatory elements (CREs), which can be dispersed over long distances, even spanning several genes . Epigenetic marks also play a role in regulating gene expression, and these can be different for paralogs located in different parts of the genome. Indeed, in humans it has been found that histone modifications can differ between the original sequence and the copy for segmental duplications . Interestingly, several studies have shown that after gene duplication, the original gene is more evolutionary constrained in sequence than the copy . For these reasons, exceptionally strong conservation of gene order could indicate that the genomic context of a gene is important for its function and/or regulation .
This preservation of gene order in different species is called synteny [50, 51, 52]. Synteny can be maintained across hundreds of millions of years, e.g. with 90% of the genome being syntenic between human and mice (their ancestral species diverged 90 million years ago (MYA)) . However, in plants synteny is generally less conserved than in animals. This is due to the fact that several rounds of WGD have occurred in plants, and the subsequent process of gene loss and genome rearrangements has blurred syntenic relationships [51, 54, 55]. Still, extensive genome collinearity can be found between closely related species, and plant species that are more distantly related show microsynteny of small genomic regions (of several genes) [55, 56, 57].
In comparative genomics, synteny is used to distinguish true orthologs from other homologous genes. Therefore, synteny can provide information about the evolution of gene families. One family for which synteny analysis has helped unravel its evolutionary history is the family of MADS-box genes [58, 59]. One MADS-box gene that displays conserved synteny is the floral B-class gene PISTILLATA (PI). The synteny of this gene is retained between the sister species of all angiosperms, Amborella and almost all other angiosperm species, with the notable exception of the Brassicaceae family .
Tarenaya hassleriana belongs to the Cleomaceae, which is a sister family to the Brassicaceae . T. hassleriana is interesting for comparative studies, as the genome is available, and this species diverged from the Brassicaceae relatively recently (35 million years ago) [62, 63]. This means that T. hassleriana is relatively closely related to the well-established model species A. thaliana. In contrast to the stereotypic decussate organ arrangement in Brassicaceae flowers, Cleomaceae species exhibit quite diverse floral morphologies . T. hassleriana’s basic floral bauplan (4 sepals, 4 petals, 6 stamens and 2 fused carpels) is similar to A. thaliana’s, but in contrast to the disymmetric Arabidopsis flowers, the flowers of Cleomaceae species display bilateral symmetry . T. hassleriana has two paralogs of the B-function gene PI . These PI paralogs are probably derived from the At-β-duplication at the origin of the Brassicales, ca. 70 MYA .
We questioned whether the genomic location of PI could influence its function, as the synteny of PI is conserved throughout the angiosperms, with the exception of the Brassicaceae. T. hassleriana has two PI paralogs, and being closely related to A. thaliana, may have traits that are intermediate between the Brassicaceae and other eudicots. Here, we investigated how these two PI paralogs in T. hassleriana diverged from each other, focusing on expression patterns and several functional features of the two TFs. Our data indicate that both PI paralogs have similar expression patterns, but diverged from each other in their biochemical properties, which could imply divergence in gene function. This finding has interesting implications for the functional evolution of PI genes in the Brassicaceae.
Phylogenetic analysis shows that one of the Tarenaya PI paralogs clusters with the Brassicaceae PI genes
The two ThPI paralogs did not diverge in expression pattern
It is known that the genomic context of a gene may influence its expression [69, 70]. As the T. hassleriana PI paralogs have different genomic environments, we investigated whether they diverged from each other in expression pattern. It was previously shown that both PI paralogs in T. hassleriana are expressed during flower development. More specifically, RT-qPCR data showed that these genes are expressed differentially during flower development, as well as in mature petals and stamens . Here, we investigated the expression patterns of these genes in more detail, using RNA in situ hybridization.
We designed probes for ThAP3, ThPI-1 and ThPI-2. The ThAP3 probe cross-hybridizes with transcripts from both ThAP3 paralogs, since the high similarity between these genes did not allow for the design of specific probes (see Additional file 1: Figure S1). The ThAP3 probe covered part of sequence encoding the K-domain and the C-terminus, as well as the 3’ UTR. The probes for the PI paralogs only covered the C-terminal part of the mRNA and the 3’UTR (which we determined using 3’RACE), not the region coding for the K-domain. Although these two PI probes share only 65% similarity at nucleotide level (longest continuous stretch of identical sequence is 14 bp), we cannot exclude some cross-hybridization with mRNA from the other paralog.
Heterologous expression of ThPI paralogs in A. thaliana results in different phenotypes
The ThPI paralogs differ in their ability to form protein-protein interactions
The heterologous expression assay indicates that the ThPI paralogs might functionally differ. As MADS-domain TFs function as part of protein complexes [23, 74, 75], we tested whether the two ThPI paralogs have different capabilities to form protein complexes. Differences in protein complex formation can be relevant, as divergence in protein-protein interactions may lead to divergent gene regulation. TFs need to bind DNA to exert their functions, therefore DNA-binding protein complexes were analyzed using Electrophoretic Mobility Shift Assays (EMSAs), a well-established method to study DNA-binding of MADS-domain protein-complexes [36, 76].
Although there are no apparent differences between ThPI-1 and ThPI-2 in their ability to form heterodimers with ThAP3 paralogs, it is possible they have different abilities to form larger protein complexes. According to the floral quartet model, B-class proteins act in tetramers with other MADS-domain TFs [23, 79]. To determine whether the ThPI paralogs differ in higher-order complex formation, we investigated their ability to form complexes with members of other homeotic protein classes. According to the floral quartet model, we expect the B-class proteins to interact with a SEPALLATA (SEP) protein and APETALA1 (AP1) in petals, whereas a complex of the B-class proteins with one AGAMOUS (AG) and one SEP protein should specify stamens. T. hassleriana has one AG gene and two genes each for SEP1/2, SEP3 and SEP4 . Focusing on the stamen-specific complex, we analyzed whether the ThPI paralogs interact differently with ThAG and the two ThSEP3 paralogs. SEP3 was chosen as a representative SEP protein, as it is suggested to be the most active A. thaliana SEP protein based on the number of different protein-interactions it forms . First, we compared higher-order complex formation of all four different B-class heterodimers. As expected, the two ThAP3 paralogs behaved similarly in these experiments (Additional file 3: Figure S2A). However, the two ThPI paralogs showed differences in higher-order complex formation. Whereas one higher-order complex (besides a dimer complex) was observed for combinations containing ThPI-1, two tetrameric complexes were observed when ThPI-2 was present. This pattern was found for both ThSEP3 paralogs (Additional file 3: Figure S2A). We studied the composition of these different complexes in more detail for one of the ThSEP3 paralogs (Th1528) (Fig. 6b, Additional file 3: Figure S2B, C). Using dropout experiments, it could be concluded that the single tetrameric complex observed with ThPI-1 consists of ThAP3/ThPI-1/ThAG/ThSEP3, which is the expected complex for stamen-specification. A similar complex (ThAP3/ThPI-2/ThAG/ThSEP3) was observed with ThPI-2 (Fig. 6b, marked with an asterisk). However, when ThPI-2 is present, a second tetrameric complex (upper band) is observed. This other complex does not contain any B-class proteins, but instead consists of only ThSEP3 and ThAG. The fact that a ThAG/ThSEP3 tetramer is formed in addition to a ThAG/ThSEP3/ThAP3/ThPI-2 complex suggests that a fraction of ThSEP3/ThAG dimers bind to each other, instead of to a ThAP3/ThPI-2 dimer. These data indicate that there are differences between the two ThPIs in affinity to form a higher-order complex with AG/SEP3. The affinity of ThAG/ThSEP3 for ThPI-2 is lower than for ThPI-1, because for ThPI-1 all ThAG/ThSEP3 dimers are incorporated into a ThAG/ThSEP3/ThAP3/ThPI-1 complex.
Summarizing, both ThPI paralogs are capable of forming a complex with ThAG and ThSEP3. However, the data suggest that they do so with different affinities, as ThPI-1 shows a higher affinity for this complex than ThPI-2.
We next investigated whether ThPI-2 has a lower affinity than ThPI-1 to form tetramers in general, or whether it is specific for certain protein combinations. We therefore first analyzed tetramer formation with ThAG and a different ThSEP paralog. Interestingly, a single tetrameric complex was observed for both of the ThPI paralogs when a ThSEP4 paralog (Th21984) was used (Fig. 6c). This indicates that the lower affinity to form a ThAG/ThSEP/ThAP3/ThPI complex with ThPI-2 than with ThPI-1 may not be a general feature of ThPI-2.
Next, we studied combinations of the B-class proteins with one ThSEP and one of the ThAP1 paralogs, a protein complex that is important for petal formation based on data from Arabidopsis (Fig. 6d, e, f). Interestingly, using ThSEP3, we observed a single complex when ThPI-1 is present, whereas two complexes are formed when ThPI-2 is present. Similar to what we observed for combinations with ThAG, it seems that the higher complex observed for combinations with ThPI-2 may not contain the B-class proteins, as it runs at the same height as ThAP1 homotetramers. When we tested the interaction of the B-class paralogs with ThAP1 and ThSEP4, we obtain a single complex for either of the ThPI paralogs, again indicating that there is no difference in complex formation with ThSEP4. When we examined combinations of the B-class proteins with ThAP1 and another SEP, ThSEP1/2, we observed a single complex for each protein combination. However, complexes containing ThPI-1 run at a different height than combinations with ThPI-2. These differences in gel shift indicate that these complexes will likely have a different protein composition, but we did not study these differences in detail.
From these experiments it can be concluded that ThPI-1 and ThPI-2 are biochemically different, as they show differences in their affinities to form higher-order complexes. ThPI-2 has a lower affinity for certain higher order complexes than ThPI-1. However, this does depend on the interaction partners, as different ThSEP paralogs gave different results. We can conclude that the ThPI paralogs (as well as the ThSEP paralogs) are diverged in their ability to form DNA-binding higher order protein complexes.
In EMSA experiments, usually a single or only a few DNA probes are used and the interaction of these probes with various protein complexes can be tested. However, it is also possible that the two ThPI paralogs differ in their DNA-binding specificity and/or general affinity to DNA. To analyze this, we used SELEX-seq (Systematic Evolution of Ligands by EXponential enrichment followed by deep sequencing)  to test whether there are any differences in DNA-binding specificity between the two ThPI paralogs. We performed SELEX-seq experiments on ThAP3/ThPI heterodimers using a custom-made A. thaliana AP3 antibody, which recognizes both T. hassleriana AP3 paralogs (see Additional file 4: Figure S3).
The synteny of the ThPI paralogs is interesting: generally, the genomic location of PI is conserved throughout the angiosperms. However, the duplication that led to the ThPI paralogs transposed one of the PI copies into a different genomic location. Whereas ThPI-2 shares very conserved synteny with PI orthologs from the rest of the eudicots, ThPI-1 is situated in a different genomic location, which it shares with the Brassicaceae.
Although closely related plant species show extensive genome colinearity [55, 56, 82, 83, 84, 85], plant species that are more diverged do not show large amounts of synteny conservation. However, microsynteny of small genomic regions (of several genes) can be found between distant plant lineages, with examples found even between rice and Arabidopsis (diverged 200 MYA) [53, 57]. Interestingly, conservation of microsynteny is not uniform over the genome [53, 57, 86]. This might indicate that synteny is more important for certain genomic regions, and possibly certain genes. Both B-class genes show extreme synteny conservation. PI is conserved in synteny in most angiosperms, except the Brassicaceae. The Cleomaceae are an intermediate form, having one PI paralog that is syntenic with most other angiosperms, and the other one shares its position with the Brassicaceae.
Considering the largely conserved synteny of PI across most angiosperms, the question arises whether the transposition of ThPI-1 into a new genomic location influenced the regulation or function of the gene.
ThPI paralogs diversified biochemically
The two Tarenaya hassleriana PI paralogs share 62% protein identity (duplication 70 million years ago ). This amount of sequence divergence falls within the range observed for functional B-class paralogs in other species. For comparison, the GLO paralogs in the Solanaceae have 63–70% protein identity (around 108 million years old ), and the paralogs MtPI and MtNGL9 in Medicago truncatula share 73% protein identity (duplication occurred around 39 million years ago) [88, 89]. Interestingly, the “original” (ThPI-2) and the transposed PI (ThPI-1) paralog diverged in sequence; the transposed paralog resembles the Brassicaceae PI in containing a less conserved PI-motif and a six amino acids extension compared with the PI homologs of most other eudicots. It would be interesting to create a detailed phylogeny for PI in the Brassicales, analyzing more Cleomaceae and Brassicacae species, as well as species in the other families within the Brassicales. Such a phylogeny could help answering questions about the evolutionary history of PI, such as when exactly the transposition took place, when one copy was lost in the Brassicaceae, and whether this copy was lost in more Brassicales families. A more detailed phylogeny could also be used to analyze whether the selection pressure on both paralogs was similar after the duplication.
We investigated if and how the two T. hassleriana PI paralogs have diverged from each other. Although based on RT-qPCR data it was reported that these genes differ in their level of expression, with ThPI-1 being higher expressed , our analysis of spatio-temporal expression patterns did not show any differences between the two ThPI paralogs, at least in the analyzed developmental stages. In addition, we analysed RNA-seq data of mature floral organs from another study, in which we did not find significant differences in expression levels between the two paralogs . Therefore, it remains unclear whether these genes differ in their level of expression. While the ThPI paralogs did not subfunctionalize in spatiotemporal expression pattern, we found that the observed sequence divergence has led to functional differences between the two proteins. In our heterologous expression experiment in A. thaliana, only ThPI-1, but not ThPI-2, was able to homeotically transform sepals into petaloid structures. This indicates that the proteins may differ in function, although this was only tested in a heterologous system so far and we cannot exclude that the transgene expression levels in our 14 35::ThPI-2 lines were not high enough to induce a phenotype in the first whorl organs (Fig. 5g, h), even though we analyzed a similar number of lines for 35S::ThPI-2 as we did for 35S::AtPI and 35S::ThPI-1. Subsequently, we performed two in vitro assays to determine whether the protein properties are different: EMSA to determine differences in the formation of DNA-binding protein complexes, and SELEX-seq to investigate the DNA binding specificities. The EMSA results show that all four possible AP3-PI heterodimers are formed in vitro, indicating that there has been no subfunctionalization at the level of protein dimerization. Subfunctionalization at the dimerization level has been observed for B-class paralogs in some species [40, 90], but not in others [89, 91, 92]. As the ThAP3 paralogs are highly similar to each other, it is not surprising that we did not find subfunctionalization at the dimerization level. We did however find subfunctionalization at the tetramer level, as the EMSA results showed that ThPI-1 is more strongly engaged in ThSEP3-containing tetrameric complexes than ThPI-2. Other ThSEP proteins however do not differentiate between ThPI-1 and ThPI-2. How relevant this discrepancy between ThPI-1 and ThPI-2 is in higher order complex formation in planta depends on the spatiotemporal expression of the different SEP genes during flower development, as the expression patterns/levels together with protein-protein affinity will determine the composition of the functional tetrameric MADS domain complexes. According to the floral quartet model , a specific tetramer is formed in each type of floral organ, which binds to two adjacent binding sites in regulatory regions of target genes. The composition of the tetramer determines in part the specificity of the protein complex for a particular target sequence. Interestingly, according to RNA-seq data of mature floral organs, the ThSEP paralogs are differentially expressed in an organ-specific manner (Additional file 6: Figure S5, data from ). Differences in expression of the ThSEP paralogs could lead to organ-specific differences in complex formation of ThPI-1 and ThPI-2. Although we show that ThSEP genes are differentially expressed in mature floral organs, the expression patterns of the ThSEP paralogs during flower development are not yet elucidated.
Another feature that impacts which genes are regulated by a TF is its DNA-binding specificity. We determined DNA-binding specificity of the T. hassleriana AP3–1/PI-1 and AP3–1/PI-2 heterodimers in vitro using SELEX-seq, and found slightly different binding motifs for the two AP3–1/PI heterodimers. Both PI/AP3–1 heterodimers bind to CArG-boxes, as expected for MADS-domain proteins. To our knowledge, this is the first determination of DNA-binding specificities of an AP3/PI dimer in any species. The only DNA-binding motif for AP3/PI which is published was generated based on ChIP-seq of the A. thaliana AP3/PI heterodimer . However, this motif can also contain information from other MADS proteins that interact with AP3/PI in higher-order complexes. The SELEX-seq based motifs we obtained for the T. hassleriana PI paralogs are more similar to each other than to this A. thaliana motif, with especially the cytosine at positions 1 and 2 of the CArG-box being more conserved in our T. hassleriana motifs than in the published A. thaliana motif. However, these differences between the A. thaliana motif and our T. hassleriana motifs are likely due to differences in methods used to obtain these motifs. SELEX-seq determines the DNA-binding specificity of TFs to unmethylated DNA in vitro. In contrast, ChIP-seq is an in vivo method, where sequences might be bound indirectly, DNA might be methylated and less accessible, and the AP3/PI heterodimer is likely part of a larger protein complex. Both DNA methylation and interaction with cofactors can influence DNA recognition by TFs. Interestingly, we observed subtle differences in specificity between the two T. hassleriana ThAP3–1/ThPI heterodimers. This may indicate that these paralogs could regulate partly different targets. It would be interesting to compare DNA-binding properties of B-class heterodimers from a wide range of Brassicales species to pinpoint when these differences originated, and whether they are correlated with the transposition events of AP3 and PI.
That paralogous TFs can exhibit differences in DNA-binding specificity has also been shown for another plant TF, LEAFY (LFY). LFY is an important regulator of floral meristem identity and is present as a single-copy TF in most plant species, with the exception of gymnosperms. Gymnosperms typically have two paralogs, LFY and NEEDLY (NLY). SELEX-seq experiments on these paralogous proteins from Welwitschia mirabilis showed that LFY and NLY have different, although overlapping DNA-binding specificities . The differences we observed in DNA-binding specificity between the ThPI paralogs should be experimentally validated. This could be done in vitro, for instance with quantitative EMSAs. To determine whether these differences are relevant in vivo, it would be interesting to perform ChIP-seq experiments with these ThPI paralogs, to determine whether they bind to different sites in the genome.
Although we found differences between the ThPI paralogs in protein-protein interactions and in DNA-binding specificity, we did not investigate whether these differences led to divergence in function in Tarenaya plants. Published data from PI duplications in other species show a range of evolutionary possibilities. In some cases, the genes are redundant, as is the case for the petunia and tomato PI paralogs. In Nicotiana benthamiana, the situation is slightly different as both PI genes are necessary for petal and stamen specification [40, 41]. In the Solanaceae species Physalis floridiana, as well as in Medicago truncatula, the paralogs diverged more substantially, as only one of the PI paralogs seems necessary for petal and stamen specification [88, 89, 95, 96]. However, at least for the Medicago truncatula PI paralogs, it was shown that they were both still under purifying selection, arguing against one paralog being in the process of becoming a pseudogene .
To elucidate how the ThPI paralogs evolved in function, functional studies need to be performed in T. hassleriana. In the absence of mutants, this type of functional data can be obtained using Virus Induced Gene Silencing (VIGS) or mutants could be generated by transformation with CRISPR/CAS9 constructs. However, protocols for these functional studies first need to be developed for T. hassleriana.
The Cleomaceae species T. hassleriana has two copies of the floral B-class gene PI. One of these paralogs is located in the same genomic position as the PI gene in most angiosperms, but the other copy has transposed into a different genomic location, which it shares with the PI genes of Brassicaceae species. These PI paralogs have similar spatiotemporal expression patterns, but may have diverged in function as heterologous expression of ThPI-1, but not ThPI-2, led to a phenotype in Arabidopsis thaliana. We show that the ThPI paralogs have diverged in interaction specificity, using in vitro methods. EMSAs show that ThPI-1 and ThPI-2 behave differently in protein-complex formation. SELEX-seq experiments using AP3/PI heterodimers show that there may be subtle differences in DNA-binding specificity between ThPI-1 and ThPI-2. Therefore, the PI paralogs of T. hassleriana may have diverged from each other in their specificity for both DNA and protein interaction partners.
Tarenaya hassleriana seeds were obtained from Eric Schranz, and are the same genotype as the sequenced Tarenaya . Tarenaya hassleriana was grown in the greenhouse with an average of 22 °C/day and 18 °C/night. Arabidopsis thaliana col-0 seeds were originally obtained from NASC. Arabidopsis thaliana was grown at 20 °C on rockwool under standard long day (18 h/6 h) conditions.
Synteny blocks were detected using MCScanX . Related synteny blocks containing PI and flanking genes across the Brassicaceae (8 Brassicaceae species+ T. hassleriana) and across other angiosperms (12 species+ T. hassleriana) were aligned and visualized using one of the python modules of JCVI , which is available at https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version).
Alignments and phylogeny
To calculate percentage identities and similarities between the paralogs, http://imed.med.ucm.es/Tools/sias.html was used, with standard settings. Alignments were generated using Muscle . Phylogenetic analyses were performed in Mega6 . Alignments were generated using a codon-based DNA-sequence algorithm. Phylogenies were produced with the Maximum likelihood method and 1000x bootstrap. Mega6 was run with the default settings, which includes the Tamura-Nei model assuming uniform substitution rates as the substitution model. Boxshade was used for the shading of the alignments.
Sequences used were: Arabidopsis thaliana, At3G54340 and At5G20240 (TAIR10); Arabidopsis lyrata, XM_002877924 And XM_002871885; Capsella rubella, Carubv10018833m and Carubv10001962m (genome version 1.0); Aethionema arabicum, AA1026G00001 and AA8G00136 (genome version V2.5); Tarenaya hassleriana, Th2v17263, Th2v17264, modified Th2v21500 and Th2v23456 (genome version 5); Gynandropsis gynandra Ggy15517, Ggy19834 and Ggy29007 (genome version V3, unpublished); Carica papaya, EF562500; Theobroma cacao, XM_007017619 and XM_007019158; Populus trichocarpa, XM_002300928 and XM_002307424; Vitis vinifera, EF418603 and NM_001280946.
RNA isolation and cDNA synthesis
RNA was isolated from T. hassleriana inflorescences using the RNeasy plant mini kit (Qiagen) according to the manufacturer’s instructions, followed by DNAse treatment (Turbo DNA-free, Ambion). cDNA was made using the RevertAid H Minus first strand cDNA synthesis kit (Fermentas) and a custom primer (5″GGCCAGGCGTCGACTAGTACTTTTTTTTTTTTTTTTT 3″).
RNA in situ hybridisation
Primers used for the 3’RACE and RNA in situ hybridisation. Primers to obtain the 3’UTR sequence as well as to generate the in situ RNA probes are shown
In situ probes
RNA in situ hybridisation was performed as described in . Sequences downstream of the MADS-domain coding sequence were used as probes (primers used can be found in Table 1). These sequences were cloned into PCR2.1® TOPO® (ThermoFischer Scientific) under the T7 promoter and used to prepare digoxigenin-labelled RNA probes. Pictures were taken with a Leica DM6000 microscope and processed with Fiji (ImageJ).
Primers used to generate pSPUTK constructs used for in vitro protein production
Primers used for heterologous expression experiment
qPCR TIP41 (reference)
Total RNA was prepared from leaves of all transgenic lines using the Invitrap spin plant RNA mini kit (Stratec) according to the manufacturer’s instructions. cDNA was prepared using the iScript cDNA synthesis kit from Biorad, according to the manufacturer’s instructions. Expression levels were determined by RT-qPCR. Expression level was measured in leaves of 6–8 week old plants, and normalized against TIP41 expression (At4g34270)  (primers used are in Table 3).
SELEX-seq was essentially performed a described before . The dsDNA libraries contained 40-nucleotide randomized sequences flanked by specific barcodes that allowed for multiplexing in high-throughput sequencing. The dsDNA libraries contained all necessary features required for direct sequencing with an Illumina Genome Analyzer . Proteins were synthesized using the TNT SP6 Quick Coupled Transcription/Translation System (Promega) following the manufacturer’s instructions in a total volume of 20 μl. The binding reaction mix was prepared essentially as described previously for EMSA experiments  and contained 20 μl of in vitro-synthesized proteins and 50–100 ng of dsDNA in a total volume of 120 μl. The binding reaction was incubated for 1 h at 21 °C followed by 1 h immunoprecipitation with 20 μl anti-HA antibodies coupled to magnetic beads (ThermoScientific) in a thermomixer at 21 °C with constant mixing at 700 rpm. After immunoprecipitation, beads were washed 5 times with 150 μl binding buffer without salmon-sperm DNA, rinsed once with 500 μl of 1xTE and bound DNA was eluted with 50 μl 1X TE by incubation in a thermomixer for 20 min at 90 °C at full mixing speed. Following this incubation, magnetic beads were immobilized and the supernatant containing the eluted DNA was transferred to a new tube. DNA fragments were amplified for 5 to 11 cycles of PCR with SELEX round-specific primers  and the total amplicon was used in the subsequent SELEX round. The amplification efficiency was checked on an agarose gel. Samples for sequencing were amplified, size-selected by agarose gel purification using the Qiaquick Gel Extraction Kit (Qiagen). Different libraries were multiplexed by mixing equimolar amounts, and sequencing was performed on the HiSeq 2000 (Illumina). Data was analyzed as in , namely sequence reads that did not pass the filter quality of CASAVA or mapped with no mismatches to the phix174 genome were eliminated. Then we calculated frequencies of the k-mer sequences in each Round except Round 0. Sequences in Round 0 represent a set of randomly synthetized oligonucleotides and their complexity did not allow for the direct calculation of k-mer frequencies. Therefore, the sequence frequency in Round 0 was estimated by the sixth-order Monte Carlo model, as proposed before . Relative affinity for each possible k-mer was calculated as the ratio between the frequencies of k-mers in final Round of enrichment to Round 0, and normalized to 1 by dividing for the highest affinity predicted k-mer.
We would like to thank Aalt-Jan van Dijk for help with the SELEX-seq data analysis. We would like to thank Johanna Müschner for help with the overexpression experiments. We would like to thank the JIC Bioimaging facility and staff for their help taking the in situ microscopy images.
SdB received a Netherlands organization for Scientific Research (NWO) Experimental Plant Science graduate school “Master Talent” fellowship. KK wishes to thank the Alexander-von-Humboldt foundation and the BMBF for support.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.
SdB, GCA and KK designed and interpreted the experiments, and wrote the manuscript. SdB performed all experiments. JMM analysed the SELEX-seq data. SdB, TZ and MES designed and interpreted the synteny analysis. All authors edited the manuscript and approved of its content prior to publication.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 11.Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, Brown JW, Sessa EB, Harmon LJ. Nested radiations and the pulse of angiosperm diversification: increased diversification rates often follow whole genome duplications. New Phytol. 2015;207(2):454–67.PubMedCrossRefPubMedCentralGoogle Scholar
- 26.Zahn LM, Kong H, Leebens-Mack JH, Kim S, Soltis PS, Landherr LL, Soltis DE, Depamphilis CW, Ma H. The evolution of the SEPALLATA subfamily of MADS-box genes: a preangiosperm origin with multiple duplications throughout angiosperm history. Genetics. 2005;169:2209–23.PubMedPubMedCentralCrossRefGoogle Scholar
- 55.Timms L, Jimenez R, Chase M, Lavelle D, McHale L, Kozik A, Lai Z, Heesacker A, Knapp S, Rieseberg L, et al. Analyses of Synteny between Arabidopsis thaliana and species in the Asteraceae reveal a complex network of small Syntenic segments and major chromosomal rearrangements. Genetics. 2006;173(4):2227–35.PubMedPubMedCentralCrossRefGoogle Scholar
- 59.Zhao T, Holmer R, de Bruijn S, Angenent GC, van den Burg HA, Schranz ME. Phylogenomic Synteny network analysis of MADS-box transcription factor genes reveals lineage-specific transpositions, ancient tandem duplications and Deep Positional Conservation. The Plant Cell. 2017;29(6):1278–92.PubMedPubMedCentralGoogle Scholar
- 67.Lange M, Orashakova S, Lange S, Melzer R, Theissen G, Smyth DR, Becker A. The seirena B class floral homeotic mutant of California poppy (Eschscholzia californica) reveals a function of the enigmatic PI motif in the formation of specific multimeric MADS domain protein complexes. Plant Cell. 2013;25(2):438–53.PubMedPubMedCentralCrossRefGoogle Scholar
- 72.Kulahoglu C, Denton AK, Sommer M, Mass J, Schliesky S, Wrobel TJ, Berckmans B, Gongora-Castillo E, Buell CR, Simon R, et al. Comparative transcriptome atlases reveal altered gene expression modules between two Cleomaceae C3 and C4 plant species. Plant Cell. 2014;26(8):3243–60.PubMedPubMedCentralCrossRefGoogle Scholar
- 74.Smaczniak C, Immink RG, Muino JM, Blanvillain R, Busscher M, Busscher-Lange J, Dinh QD, Liu S, Westphal AH, Boeren S, et al. Characterization of MADS-domain transcription factor complexes in Arabidopsis flower development. Proc Natl Acad Sci U S A. 2012;109(5):1560–5.PubMedPubMedCentralCrossRefGoogle Scholar
- 86.Gebhardt C, Walkemeier B, Henselewski H, Barakat A, Delseny M, Stüber K. Comparative mapping between potato (Solanum tuberosum) and Arabidopsis thaliana reveals structurally conserved domains and ancient duplications in the potato genome. Plant J. 2003;34(4):529–41.PubMedCrossRefPubMedCentralGoogle Scholar
- 88.Benlloch R, Roque E, Ferrandiz C, Cosson V, Caballero T, Penmetsa RV, Beltran JP, Canas LA, Ratet P, Madueno F. Analysis of B function in legumes: PISTILLATA proteins do not require the PI motif for floral organ development in Medicago truncatula. Plant J. 2009;60(1):102–11.PubMedCrossRefPubMedCentralGoogle Scholar
- 92.Gong P, Ao X, Liu G, Cheng F, He C. Duplication and whorl-specific Down-regulation of the obligate AP3-PI heterodimer genes explain the origin of Paeonia lactiflora plants with spontaneous Corolla mutation. Plant Cell Physiol. 2017;58(3):411-25.Google Scholar
- 98.Haibao Tang VKaJL: jcvi: JCVI utility libraries. 2015. Available at: https://doi.org/10.5281/zenodo.31631.
- 106.Slattery M, Riley T, Liu P, Abe N, Gomez-Alcala P, Dror I, Zhou T, Rohs R, Honig B, Bussemaker H. J., Mann R. S. Cofactor binding evokes latent differences in DNA binding specificity between Hox proteins. Cell. 2011;147:1270–1282Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.