Main

Darwinian evolution posits that natural selection, acting on heritable, random, ‘successive, slight variations’ in organisms over billions of years, can result in new biological features1. Although recent work has revealed that biological novelty is often attributable to changes in transcriptional regulation2,3,4,5,6, detailed analyses of such changes are often limited to a subset of the cis- or trans-elements involved7,8,9,10,11,12,13. Here, we present a step-by-step analysis of evolution in a combinatorial transcriptional circuit that regulates mating in many yeast species of the ascomycete lineage.

Mating type in the yeasts Saccharomyces cerevisiae and Candida albicans is controlled by a segment of DNA called the MAT locus14,15,16. The MAT locus exists in two versions, MATa and MATα, each of which encodes unique sequence-specific DNA-binding proteins that direct an extensive program of gene transcription. Cells that express only the MATa- or MATα-encoded DNA-binding proteins are a-cells and α-cells, respectively, and are specialized for mating. The a-cells express a-specific genes (asgs), which are required for a-cells to mate with α-cells. Likewise, α-cells express the α-specific genes (αsgs). The third cell type, a/α, is formed when an a-cell mates with an α-cell. These cells do not mate, because the asgs and αsgs are turned off (Fig. 1a).

Figure 1: a-type mating is negatively regulated in modern S. cerevisiae , but was positively regulated in its ancestor.
figure 1

a, S. cerevisiae and C. albicans transcribe their genes according to one of three programs, which produce the a-, α- and a/α-cells. The particular cell type produced is determined by the MAT locus, which encodes sequence-specific DNA-binding proteins (coloured blocks). Regulation of a-type mating differs substantially between S. cerevisiae and C. albicans. In S. cerevisiae, a-type mating is repressed in α-cells by α2. In C. albicans, a-type mating is activated in a-cells by a2. In both organisms, a-cells mate with α-cells to form a/α-cells, which cannot mate. b, a2 is an activator of a-type mating over a broad phylogenetic range of yeasts16,18–21,47. In S. cerevisiae and close relatives, a2 is missing and α2 has taken over regulation of the asgs22.

Although this strategy is the same for both S. cerevisiae and C. albicans, the molecular details differ in a remarkable way16. In S. cerevisiae, the asgs are on by default, and are repressed in α- and a/α-cells by a homeodomain protein (α2) that is encoded by MATα. In C. albicans, however, the asgs are off by default, and are activated in a-cells by an HMG-domain protein (a2) that is encoded by MATa (Fig. 1a). Both molecular mechanisms give the same logical output: asgs are expressed only in a-cells. As the a2-activation mode is found over a broad phylogenetic range of yeasts, this strategy most likely represents the ancestral state16,17,18,19,20,21 (Fig. 1b). By contrast, the a2 gene was recently lost in the S. cerevisiae lineage, which now uses the α2-repressing mode of asg regulation22, indicating that α2-mediated repression of asgs is a recent innovation.

The evolutionary transition from positive to negative regulation of the asgs has necessarily included at least two steps: (1) asg expression becoming independent of the activator a2, and (2) asgs coming under negative control by α2. We have used experimental and informatic approaches to identify multiple changes in cis- and trans-elements that underlie these steps; we also infer the order in which these steps probably occurred.

Identification of ancestral a-specific genes

To understand how α2 came to repress the asgs in S. cerevisiae, we first sought the ancestral cis-element that was responsible for positive regulation of asgs by a2. We reasoned that extant yeasts that retain the ancestral regulatory logic, such as C. albicans, might also have retained cis-elements close to the ancestral form. C. albicans, a fungal pathogen of humans, last shared a common ancestor with S. cerevisiae 200–800 million years ago16,23,24.

We first experimentally identified the asgs in C. albicans by comparing the transcriptional profiles of pheromone-induced a-cells to that of pheromone-induced α-cells (Fig. 2; for experimental details, see Methods and Supplementary Fig. 1)16,25,26. This comparison revealed a group of six genes that were induced only in a-strains (Fig. 2c). Below, we show that expression of the gene STE2 is also specific to a-cells. Of these seven genes, four have orthologues previously classified as a-specific in S. cerevisiae (ASG7, BAR1, STE2 and STE6), indicating that they were a-specific in the common ancestor of S. cerevisiae and C. albicans.

Figure 2: Identification of a-specific genes in C. albicans.
figure 2

Pheromone induction profiles of a-cells (RBY731) and α-cells (ATY497) in six pheromone induction time-courses are compared. Top: genes upregulated only in a-cells. Middle: genes upregulated only in α-cells. URA3 is induced because it is under the control of the STE3 promoter. Bottom: subset of genes upregulated in both a- and α-cells. The first two time-courses were previously published by Bennett et al.26.

Identification of C. albicans asg regulatory sequences

To identify cis-elements that are involved in activation by a2, we submitted C. albicans asg promoters (1,000 base pairs (bp)) to the motif-finding program MEME27. In the promoters of six asgs, we found a regulatory element with several distinctive features (Fig. 3a). First, at 26-bp long, the element is more specified than the typical eukaryotic cis-acting sequence. Second, the sequence contains a region that closely resembles the binding site of Mcm1, a MADS box sequence-specific DNA-binding protein that is expressed equally in all three mating types, and is required for the regulation of both asgs and αsgs in S. cerevisiae. The Mcm1 residues that contact DNA28,29 are fully conserved between C. albicans and S. cerevisiae, strongly implicating this region of the element as a binding site for Mcm1 in C. albicans. Third, the putative Mcm1 site in C. albicans asg promoters lies next to a motif of the consensus sequence CATTGTC (Fig. 3a). The spacing between this motif and the Mcm1 site is always 4 bp. This motif is similar to demonstrated binding sites for a2 orthologues in Schizosaccharomyces pombe and Neurospora crassa, and to the α2 monomer site of S. cerevisiae20,30,31,32 (Fig. 3b).

Figure 3: Identification and validation of the C. albicans asg operator.
figure 3

a, 1,000 bp of each C. albicans asg promoter were submitted to MEME27. The motif shown was present in six asg promoters. Distance from the translation start site is indicated. The element contains a conserved 7-bp site (yellow) and a putative Mcm1 binding site (orange), separated by 4 bp. b, The 7-bp motif is similar to binding sites of a2 orthologues from N. crassa and S. pombe, as well as α2 from S. cerevisiae20,30–32. S. pombe MatMc binds the two indicated sites equally30. c, d, A wild-type (c) or mutant (d) 2,023-bp fragment of the STE2 promoter was fused to a GFP reporter and integrated at the RP10 locus of C. albicans33. Top panels: uninduced cells. Bottom panels: α-factor induction. Only the wild-type STE2 promoter activates GFP expression (bottom right panels).

Experimental validation of C. albicans asg regulatory sequence

To test whether the motif upstream of C. albicans asgs is functional, we fused a wild-type or mutant fragment of the STE2 promoter to a green fluorescent protein (GFP) reporter33 (Fig. 3c). In the mutant promoter, the conserved motif was mutated from CATTGTC to CATAATC, a change that is predicted to destroy the a2-binding site. The wild-type promoter activated GFP on exposure to α-factor (Fig. 3c), whereas the mutant promoter showed no induction of GFP (Fig. 3d), demonstrating that this cis-element is required for a2-dependent activation of asgs.

Analysis of cis- asg regulation across species

For ancestral asgs to undergo the transition from positive to negative regulation, a2-bound cis-elements were probably lost, whereas α2-bound elements must have been gained. To investigate when this transition occurred, we first inferred a phylogeny of 16 yeast species whose genomes have been sequenced, then identified orthologues of the asgs of C. albicans and S. cerevisiae in all 16 yeasts34,35 (Fig. 4b, see Methods). Position-specific scoring matrices (PSSMs) constructed from the S. cerevisiae or C. albicans asg operators (Fig. 4a) were used to scan the promoters of each asg orthologue. Maximum log10-odds scores are shown in Fig. 4c, d.

Figure 4: Analysis of cis - asg regulation across species.
figure 4

a, S. cerevisiae α2–Mcm1 and C. albicans a2–Mcm1 position-specific scoring matrices (PSSMs) were derived from the seven S. cerevisiae asg operators, or six C. albicans asg operators. b, A phylogeny of 16 sequenced yeasts was inferred using methods similar to those of Rokas et al.34. asg orthologue promoters were scanned with the S. cerevisiae PSSM (c) or C. albicans PSSM (d). Maximum log10-odds scores are shown. Darker shades of red indicate stronger matches. e, Promoters from the K. waltii, K. lactis and E. gossypii orthologues of ASG7, BAR1, STE2 and STE6 were pooled and submitted to MEME27. The recovered motif has elements of both the S. cerevisiae and C. albicans asg operators: an a2-like site resembles that of C. albicans, and the tripartite structure resembles the S. cerevisiae operator.

S. cerevisiae-like asg operators (an Mcm1 site flanked by two α2-binding sites) were clearly found in orthologous promoters of organisms as far diverged as Saccharomyces castellii. In further diverged organisms, the presence of an S. cerevisiae-like asg operator was diminished, although it was found in some Candida glabrata, Kluyveromyces lactis, Eremothecium gossypii and Kluyveromyces waltii promoters (Fig. 4c). The C. albicans PSSM yielded a nearly converse pattern (Fig. 4d). Organisms that branch with C. albicans have C. albicans-like asg operators (an Mcm1 site flanked by a single a2 site); however, this matrix recovered no significant matches in species close to S. cerevisiae, correlating with the loss of a2 (ref. 22). These results are unchanged by recently proposed alternative phylogenetic topologies36.

Identification of the asg operator in the K. lactis branch

Neither the C. albicans matrix nor that of S. cerevisiae elicited strong matches in the K. lactis-branch yeasts, which share a more recent common ancestor with S. cerevisiae than does C. albicans (Fig. 4b). To determine independently whether this lineage has a unique asg operator, we submitted promoters of the ancestral asg orthologues (ASG7, BAR1, STE2 and STE6) from the K. lactis branch yeasts to MEME. The highest scoring hit was a DNA motif with features in common with both the S. cerevisiae and C. albicans asg operators, indicating that it might be a transitional form. As in C. albicans, this motif contains an Mcm1 site flanked by an a2 site on one side. However, it is also defined on the opposite side, resembling the tripartite structure of the S. cerevisiae operator. This additional sequence information is similar to both the S. cerevisiae α2- and C. albicans a2-site consensus binding sequences; moreover, the spacing from the Mcm1 binding sequence is also similar to that found in S. cerevisiae and C. albicans asg operators (Fig. 4e). An independent clustering analysis of putative asg operators supports the idea that there is a transitional form in the K. lactis branch (see Supplementary Fig. 2).

Because of low genome sequence coverage36 we did not systematically incorporate the yeast Saccharomyces kluyveri, which branches near K. lactis and retains a2 (refs 11, 22, 37, 38), into our studies. However, the available sequences of asg promoters from S. kluyveri also contain operators similar to those of the K. lactis branch (not shown), indicating that transitional forms of the operator might also exist in this species.

Emergence of the α2–Mcm1 interaction

Repression of the asgs in S. cerevisiae requires a cooperative interaction between the trans-factors α2 and Mcm1. To determine when this interaction arose, we aligned orthologues of α2 and Mcm1 across several yeast species, then searched for conservation of the interaction interface28,29,39 (Fig. 5a, b). The region of Mcm1 that contacts α2 is highly conserved across all species analysed (Fig. 5a). Many proteins besides α2 contact this region, so the high degree of conservation is not surprising. By contrast, the portion of α2 that contacts Mcm1 varies considerably across yeasts (Fig. 5b). A critical nine-residue ‘linker’ region that is required for the interaction between α2 and Mcm1 in S. cerevisiae39 is highly conserved from S. cerevisiae to C. glabrata, and is also somewhat conserved in K. lactis and S. kluyveri; however, this region shows no conservation in yeasts that branch with C. albicans, consistent with observations that α2 is not involved in asg expression in C. albicans16 (Fig. 1a).

Figure 5: Evolution of the α2–Mcm1 interaction.
figure 5

a, Mcm1 sequences from 12 species are aligned. Arrows denote residues of Mcm1 that contact α2 in S. cerevisiae28,29. b, α2 sequences from 13 species are aligned. Arrows indicate residues of α2 that contact Mcm1, and are required for α2–Mcm1 repression29,39. This region is well conserved out to C. glabrata, with K. lactis and S. kluyveri α2 also showing significant conservation. c, The K. lactis α2–Mcm1 complex was modelled using the crystal structure of the S. cerevisiae α2–Mcm1 complex (PDB ID: 1MNM; Tan S 1998) as a template. Left: S. cerevisiae α2 linker region and Mcm1 interface. Mcm1–Arg87 (blue asterisk of a) and α2–Phe116 (green asterisk of b) form a favourable pi-stacking interaction. Right: K. lactis model. The Arg87–Phe116 interaction is not present, indicating that the K. lactis interaction is probably weaker than that of S. cerevisiae.

Structural homology modelling of K. lactis α2 and Mcm1 using the S. cerevisiae crystal structure29 as a template shows that, despite several substitutions, the α2–Mcm1 interaction interfaces in K. lactis are fully compatible40 (Fig. 5c); thus, the appearance of the α2–Mcm1 interaction coincides with the emergence of the tripartite, S. cerevisiae-like asg operator in the K. lactis branch (Fig. 4). This suggests that the K. lactis asg operators are bound by α2–Mcm1. We also know that K. lactis a2 is required for wild-type levels of a-type mating (A.E.T., unpublished work). Together, our data indicate that K. lactis asgs are controlled by both α2 and a2 through one of three possible scenarios: (1) some operators are bound exclusively by a2–Mcm1 and others are bound exclusively by α2–Mcm1, (2) hybrid operators are bound by both a2–Mcm1 and α2–Mcm1, or (3) a combination of these.

Discussion

In this work, we identify a group of genes (the asgs) that was positively regulated in an ancestral yeast, but is negatively regulated in modern S. cerevisiae. Orthologues of these genes are required for sexual differentiation in fungal lineages that are proposed to span up to 1.3 billion years of evolution24,41. We identify specific changes in cis- and trans-elements that underlie the two critical steps in this transition: (1) asg expression becoming independent of the activator a2, and (2) asg expression coming under negative control of α2. The nature of these changes provides a plausible explanation for how fitness barriers were overcome during the regulatory transition, both in terms of the smaller-scale challenges of evolving individual protein–protein and protein–DNA interactions, and as regards the larger-scale challenge of maintaining appropriate asg regulation throughout the transition.

Independence of asg expression from the activator a2

During the transition from positive to negative regulation of the asgs, asg expression became independent of the activator a2. We have shown that the transcriptional regulator Mcm1 was present at ancestral asg promoters as a co-activator with a2; in S. cerevisiae, Mcm1 is also present at asg promoters, serving as both an activator (on its own) and a co-repressor (with α2). In S. cerevisiae, high A/T content surrounding the Mcm1 binding site allows Mcm1 to function without a cofactor42. Therefore, a simple increase in the A/T content surrounding the ancestral Mcm1 binding site could ‘tune up’ existing Mcm1 activity so that it no longer requires the cofactor a2 to activate transcription. Consistent with this idea, the A/T content flanking Mcm1 sites in S. cerevisiae asg operators is far higher than that flanking Mcm1 sites in C. albicans asg operators (Fig. 4a).

Establishment of asg repression by α2

On its own, an increase in A/T content flanking the Mcm1 site would lead to inappropriate constitutive activation of the asgs, as Mcm1 is expressed equally in all cell types. However, asg regulation could be maintained if this increase were accompanied by evolution of α2-mediated repression. Indeed, this is precisely what we observe: cis- and trans-changes, signifying the emergence of α2-mediated repression in the K. lactis branch, accompany the increase in A/T content surrounding the Mcm1 site (Figs 4e, 5). Previous involvement of Mcm1 in asg regulation probably assisted in the evolution of α2-mediated repression by increasing the number of surfaces available for α2-promoter interaction to include both protein and DNA.

The similarity of the a2-binding site (CATTGTC) to the α2-binding site (CATGT), in both sequence and spacing from the Mcm1 site, no doubt contributed to the evolution of the S. cerevisiae asg operator (Figs 3, 4); a small change to the cis-element could convert it from an a2- to an α2-recognition sequence. The similarity of the sites is particularly striking, given that a2 and α2 belong to different protein families (the HMG and homeodomain families, respectively).

Ordering the pathway

An important clue as to the order in which individual cis- and trans- changes occurred comes from K. lactis and S. kluyveri. Both yeasts have retained a2 at their MATa loci, but in both yeasts an α2–Mcm1 interaction interface and a tripartite asg operator similar to the S. cerevisiae α2–Mcm1 binding site have emerged. By examining the data in a phylogenetic context (Fig. 6d), we can tentatively define the succession of events leading to repression of asgs in modern S. cerevisiae as follows. First, a2–Mcm1 activated asgs in an ancestor (Fig. 6a, d). Subsequently, the α2–Mcm1 protein interaction evolved, coincident with evolution of an α2 site and a strengthening of the Mcm1 binding site in the asg operator (Fig. 6b, d). After the divergence of K. lactis, the α2–Mcm1 cis-operator specificity and A/T content were increased, and a2 was lost, completing the hand-off from positive to negative control (Fig. 6c, d). A crucial feature of this model is that asgs are appropriately regulated throughout each stage of circuit evolution, a condition made possible by the continued presence of Mcm1. Intriguingly, both the loss of a2 and the conversion of asg regulation to an exclusively negative regulatory scheme coincide with a whole-genome duplication43,44. The evolution of the asg regulatory circuit might have been facilitated in part by greater flexibility in asg regulation conferred by duplication of its component cis- and trans-elements.

Figure 6: Ordering the changes in cis - and trans -regulatory elements.
figure 6

a, In an ancestral yeast, a2–Mcm1 activated asgs in a-cells. This scheme persists in modern C. albicans. b, cis- and trans-elements in the K. lactis branch suggest that asgs are positively regulated by a2–Mcm1 in a-cells and negatively regulated by α2–Mcm1 in α-cells. c, In modern S. cerevisiae, asgs are activated by Mcm1 in a-cells and repressed by α2–Mcm1 in α-cells. d, The regulatory schemes shown in ac are mapped onto extant species and ancestral nodes. Species from C. albicans to S. pombe most closely resemble a16–19, whereas K. lactis fits b and S. cerevisiae and C. glabrata fit c. The most parsimonious evolutionary scenario maps scheme a as the ancestral state. Scheme b is transitional, first appearing in the ancestor of K. lactis and S. cerevisiae. Scheme c is the most derived, appearing in the ancestor of C. glabrata and S. cerevisiae.

Conclusion

Our analysis shows how a concerted series of subtle changes in cis- and trans-elements can lead to a profound evolutionary change in the wiring of a combinatorial circuit. These changes include: (1) ‘tuning up’ of a binding site for a ubiquitous activator, making gene expression independent of a cell-type-specific activator; (2) a small change in an existing DNA-binding site, converting its recognition from one protein to that of an unrelated protein; (3) a small change in the amino-acid sequence of a sequence-specific DNA-binding protein, allowing it to bind DNA cooperatively with a second protein. Significantly, the coordinated optimization of protein–DNA and protein–protein interactions that we have described allows regulation of the target genes to be maintained throughout a major evolutionary transition. Because the proteins that have participated in this transition represent several highly conserved and prominent protein families, including the MADS box family (Mcm1), the HMG-domain family (a2) and the homeodomain family (α2), the types of change we have described are likely to apply to other examples of transcriptional circuit evolution.

Methods

Detailed information on Methods is described in the Supplementary Information.

Strain construction

The pheromone a-factor has not yet been identified in C. albicans. To compare the pheromone response of a-cells to that of α-cells, we ‘fooled’ α-cells into responding to α-factor by ectopically expressing the α-factor receptor (strain ATY497), a strategy previously employed in S. cerevisiae25. Constructs and primers used are listed in Supplementary Information.

Induction of α-factor

Strains were grown to an optical density (OD600) of 1.0 in YEPD plus 55 µg ml-1 adenine, then induced with 10 µg ml-1 α-factor from a stock dissolved in either dimethylsulphoxide (DMSO) or water. Sample preparation and microarrays were as previously described26. All microarray data are available online (http://genome.ucsf.edu/asg_evolution/).

Yeast phylogeny

Briefly, groups of orthologous genes (see Supplementary Information) with one and only one representative from each of the 16 yeasts were multiply aligned with ClustalW45, then concatenated to yield a single alignment. A maximum-likelihood species tree was inferred from this alignment using the TREE-PUZZLE algorithm46. Trees with identical topologies were also generated using additional algorithms (see Supplementary Information).

Structural modelling

The K. lactis α2–Mcm1 interaction was modelled using the Protein Local Optimization Program, by M. P. Jacobson, Department of Pharmaceutical Chemistry, University of California San Francisco, USA (http://francisco.compbio.ucsf.edu/~jacobson/), using the crystal structure of the S. cerevisiae α2–Mcm1 complex (PDB ID: 1MNM; Tan S 1998) as a template.