Evolutionary insights into the role of the essential centromere protein CAL1 in Drosophila
Centromeres are essential cis-elements on chromosomes that are crucial for the stable transmission of genetic information during mitotic and meiotic cell divisions. Different species employ a variety of centromere configurations, from small genetically defined centromeres in budding yeast to holocentric centromeres that occupy entire chromosomes in Caenorhabditis, yet the incorporation of nucleosomes containing the essential centromere-specific histone H3 variant CENP-A is a common feature of centromeres in all eukaryotes. In vertebrates and fungi, CENP-A is specifically deposited at centromeres by a conserved chaperone, called HJURP or Scm3, respectively. Surprisingly, homologs of these proteins have not been identified in Drosophila, Caenorhabditis, or plants. How CENP-A is targeted to centromeres in these organisms is not known. The Drosophila centromeric protein CAL1, found only in the Diptera genus, is essential for CENP-A localization, is recruited to centromeres at a similar time as CENP-A, and interacts with CENP-A in both chromatin and pre-nucleosomal complexes, making it a strong candidate for a CENP-A chaperone in this lineage. Here, we discuss the conservation and evolution of this essential centromere factor and report the identification of a “Scm3-domain”-like region with similarity to the corresponding region of fungal Scm3 as well as a shared predicted alpha-helical structure. Given the lack of common ancestry between Scm3 and CAL1, we propose that an optimal CENP-A binding region was independently acquired by CAL1, which caused the loss of an ancestral Scm3 protein from the Diptera lineage.
KeywordsCentromere chromosome mitosis meiotic drive kinetochore CENP-A CAL1 CID histone variants chromatin
Basic local alignment search tool
Chromosome alignment defect 1
CENP-A targeting domain
DNA methyl transferase 3
Holliday junction recognizing protein
Hidden Markov model
Kinetochore null 2
Likelihood ratio test
Mis18 binding protein 1
Phylogenetic analysis by maximum likelihood
Suppressor of chromosome missegregation 3
Cells use specialized regions of chromosomes, called centromeres, to define the location where kinetochores form during mitosis and meiosis. Kinetochores physically connect sister chromatids to spindle microtubules and mediate their transit towards opposite poles of forming daughter cells. Centromeres are stably inherited from one cell generation to the next and from parent to progeny. Within a species, the centromere position is maintained throughout evolutionary times. Centromere inheritance relies on the faithful incorporation of a unique type of chromatin, which contains the essential and universally conserved histone H3 variant, CENP-A (called CID in Drosophila) (Allshire and Karpen 2008). CENP-A loss from centromeres results in the failure to assemble kinetochores (Liu et al. 2006) while mislocalization or direct targeting of CENP-A to chromosome arms is sufficient to generate ectopic kinetochores (Heun et al. 2006; Mendiburo et al. 2011). Therefore, the presence of CENP-A within chromatin defines the position of the centromere.
Centromere identification relies on two pathways. First, pre-existing CENP-A nucleosomes must be stably transmitted to the newly forming sister chromosomes during DNA replication (centromere maintenance). Second, newly synthesized CENP-A must be incorporated onto pre-existing CENP-A nucleosome “templates” to replenish the nucleosomes lost during DNA replication (centromere assembly). While little is known about the pathways and molecules involved in centromere maintenance, a lot has been learned about the process of centromere assembly, particularly thanks to recent studies in human cells and Xenopus extracts (Barnhart et al. 2011; Bassett et al. 2012; Guse et al. 2011; Moree et al. 2011; Shuaib et al. 2010; Dunleavy et al. 2009; Foltz et al. 2009).
It has long been appreciated that the DNA underlying centromeres is dramatically divergent across evolutionarily distant organisms. With the exception of yeast, mouse, and some human cell lines (Clarke and Carbon 1980; Hahnenberger et al. 1989; Harrington et al. 1997; Ikeno et al. 1998; Moralli et al. 2006), this DNA is not sufficient to attract CENP-A. Aside from the holocentric organism C. elegans, where CENP-A deposition occurs de novo at each cycle (Gassmann et al. 2012), metazoans use pre-existing CENP-A chromatin as a cue for further CENP-A recruitment at each cell cycle. This phenomenon is highlighted by the observation that once CENP-A is incorporated onto DNA, through either direct or indirect targeting strategies to chromosome arms, it is stably transmitted by the endogenous assembly pathway (Barnhart et al. 2011; Mendiburo et al. 2011). CENP-A has virtually no turnover at centromeres in animal cells as it is distributed to sister centromeres semi-conservatively (Hemmerich et al. 2008; Jansen et al. 2007; Mellone et al. 2010) and thus is always available to provide the “signal” needed by the CENP-A assembly machinery. Here, we explore the CENP-A assembly pathway of vertebrates and fungi, which harbor conserved centromere assembly factors, and of insects, which appear to have evolved an independent set of molecules to accomplish this essential function. Using bioinformatics and phylogenetics, we provide new insights into how CENP-A assembly may be mediated in these species and speculate that, despite the divergence at the protein level, the underlying mechanisms may be conserved.
Mechanisms of CENP-A assembly
How the presence of the CENP-A mark triggers the assembly of newly synthesized CENP-A at every cell cycle has been the subject of intense investigation. The molecules responsible for this pathway must accomplish two tasks: recognition of pre-existing CENP-A chromatin and assembly of new CENP-A. A key player in this process is the CENP-A assembly factor HJURP, which has been found in tetrapods as well as a choanoflagellate (Sanchez-Pulido et al. 2009; Foltz et al. 2009; Dunleavy et al. 2009). A homolog, called Scm3, is also present in fungi (Camahort et al. 2007; Mizuguchi et al. 2007; Pidoux et al. 2009; Stoler et al. 2007). HJURP selectively recognizes pre-nucleosomal CENP-A from canonical histone H3 and targets it to centromeres, starting in telophase and continuing through G1 (Jansen et al. 2007; Lagana et al. 2010). Both HJURP and Scm3 have been shown to possess CENP-A assembly activities in vitro (Dechassa et al. 2011; Barnhart et al. 2011; Shivaraju et al. 2011). In humans, HJURP is recruited to the centromere by M18BP1 (Barnhart et al. 2011), a subunit of the Mis18 complex, which is essential for CENP-A incorporation (Fujita et al. 2007; Hayashi et al. 2004). M18BP1 is recruited to centromeres by direct binding to CENP-C (Moree et al. 2011; Dambacher et al. 2012), a constitutive centromere protein that connects the centromere to the kinetochore (Screpanti et al. 2011; Przewloka et al. 2011) and which binds to the very C-terminal residues of CENP-A (Guse et al. 2011).
Despite the crucial importance of Scm3 and HJURP and despite the fact that these proteins are shared between organisms as divergent as Homo sapiens and Saccharomyces cerevisiae (which last shared a common ancestor about one billion years ago), Scm3 and HJURP are not universal among metazoans as they appear to be absent from Drosophila and Caenorhabditis species (Sanchez-Pulido et al. 2009). A functional screen for regulators of the localization of CID also did not identify a HJURP/Scm3 homolog in Drosophila (Erhardt et al. 2008). Drosophila and Caenorhabditis species may have unrecognizable HJURP/Scm3 orthologs due to rapid evolution occurring in these lineages (Sanchez-Pulido et al. 2009); alternatively, novel proteins capable of interacting with CENP-A and assembling it onto chromatin may have emerged in their genomes. Notably, plants also do not harbor HJURP/Scm3 homologs (Dubin et al. 2010). Thus, HJURP and Scm3, unlike CENP-A, are not universal in Eukarya.
The ability to distinguish CENP-A from histone H3 is crucial to ensure centromere integrity. A region spanning part of α-helix 1, loop 1, and α-helix 2 within CENP-A (called the centromere targeting domain (CATD)) has been shown to be critical for CENP-A centromeric localization in humans, Drosophila and yeast (Shelby et al. 1997; Vermaak et al. 2002; Collins et al. 2007) and for recognition by HJURP (Bassett et al. 2012; Sekulic et al. 2010; Foltz et al. 2009). In HeLa cells, a chimeric histone H3 containing the CENP-A CATD (H3CATD) was shown to be sufficient for centromeric localization and kinetochore function, suggesting that this region is critical for the assembly of centromeric factors (Black et al. 2007). Furthermore, recombinant co-expressed chimeric H3CATD and histone H4 directly interact with GST-HJURP in vitro (Foltz et al. 2009). In contrast, H3CATD chimeras do not localize to centromeres in Drosophila (Moreno-Moreno et al. 2011) or Arabidopsis (Ravi et al. 2010), demonstrating that this domain, albeit important, is not sufficient for centromere targeting in these species. It is possible that the alternative CENP-A chaperones used in organisms lacking HJURP/Scm3 have different specificity requirements.
Another unusual feature of Drosophila species is that they also lack members of the Mis18 complex (which in humans are Mis18α, Mis18β, and M18BP1) whereas an M18BP1 homolog, called KNL2, is present in nematodes (Maddox et al. 2007). Along with HJURP recruitment (Barnhart et al. 2011), functions of the Mis18 complex include transient histone H3 acetylation at the centromere in humans (Ohzeki et al. 2012), as well as centromeric recruitment of DNMT3 and DNA methylation in mice (Kim et al. 2012). This suggests that the Mis18 complex functions by recruiting the CENP-A assembly machinery and by creating the correct epigenetic context for its incorporation.
Drosophila CAL1 is an essential regulator of CID assembly
An intriguing candidate that might fulfill the roles of either HJURP or the Mis18 complex, or both, in Drosophila is CAL1 (chromosome alignment defect 1) (Goshima et al. 2007; Erhardt et al. 2008). Depletion of CAL1 completely abolished the localization of CID and CENP-C and resulted in failure to segregate chromosomes (Erhardt et al. 2008). Using yeast two-hybrid assays, it was shown that the N terminus of CAL1 (residues 1–407) interacts with CID, while the C terminus (residues 699–979) interacts with CENP-C. The middle region (aa 392–722), which is less conserved across CAL1 orthologs, is required for the nucleolar localization of CAL1, but its function, if one exists, is unknown. Interestingly, CID and CENP-C do not interact in the absence of CAL1 (Schittenhelm et al. 2010). This is in sharp contrast with observations from reconstituted CENP-A chromatin in Xenopus egg extracts, where the C-terminal CENP-A tail is sufficient to recruit CENP-C (Guse et al. 2011).
The observation that CAL1 is required to bring CID and CENP-C together in a complex has led to the proposal that CAL1 may simply be a bridging factor (Schittenhelm et al. 2010). However, the observation that CAL1 and CID interact in pre-nucleosomal complexes, and that CAL1 is recruited along with CID in mitosis (Mellone et al. 2010), suggests that CAL1 may also play a role in delivering CID to the centromere.
Identification of novel CAL1 orthologs outside of the Drosophila genus
Determining whether or not CAL1 has homologs beyond Drosophila could shed light on its origin and determine if CAL1 and HJURP/Scm3 homologs coexist in non-Drosophilid species. There are currently 12 orthologs of CAL1 listed on FlyBase, all from within Drosophila: Drosophila melanogaster, Drosophila simulans, Drosophila sechellia, Drosophila yakuba, Drosophila erecta, Drosophila anannassae, Drosophila pseudoobscura, Drosophila persimilis, Drosophila grimshawi, Drosophila mojavensis, Drosophila virilis, and Drosophila willistoni. Our searches using the Basic Local Alignment Search Tool (BLAST) revealed no additional significant hits aside from the known CAL1 orthologs. Since position-specific iterated BLAST (PSI-BLAST) is a more sensitive BLAST, we used it to identify more distant homologs. A second iteration identified an unnamed sequence (AGAP004338-PA) from Anopheles gambiae that showed some conservation with CAL1 and a third iteration identified an unnamed sequence (EFR29609) from Anopheles darlingi, which also showed similarity with CAL1. Next, the query sequence was divided according to the three functional domains of D. melanogaster CAL1 (Schittenhelm et al. 2010) and separate BLASTp (protein BLAST) searches were performed for each. A BLASTp of the N terminus, in addition to all the known CAL1 homologs within Drosophila, identified hypothetical proteins from Culex quinquefasciatus (XP_001843218.1) and Aedes aegypti (AaeL_AAEL009578). We conclude from these findings that CAL1 is conserved outside of the Drosophila genus. Interestingly, a BLASTp of the C terminus identified only the known Drosophila CAL1 orthologs, suggesting that the N terminus is more conserved than the C terminus across distant dipteran species. BLASTp of the middle region of CAL1 only identified CAL1 orthologs from D. yakuba and D. simulans, consistent with the previously reported divergence of this part of CAL1 (Erhardt et al. 2008; Schittenhelm et al. 2010).
Additional BLASTp, BLASTn, and BLASTx (identification and comparison of protein coding sequences in genomic DNA) were carried out using the CAL1 sequence of D. melanogaster alongside the putative CAL1 ortholog from D. grimshawi, which, being a more distant relative of the D. melanogaster CAL1, could help to identify more distant homologs in the species tree. However, no additional sequences with significant similarity were found, leading us to speculate that CAL1 is not present outside of Diptera. Importantly, BLAST searches within Diptera species using HJURP and Scm3 sequences and alignments did not yield any significant hits, consistent with the idea that HJURP/Scm3 are not conserved in Diptera and that CAL1 and HJURP/Scm3 are mutually exclusive proteins.
The N terminus of CAL1 contains a “Scm3 domain”-like region
After BLAST searches failed to identify any relatives of CAL1 outside Diptera genomes, we used HHPred to identify homologous regions within CAL1 in other species (Soding et al. 2005). HHPred identifies hits by comparing a profile hidden Markov model (HMM) of the input alignment to HMMs of sequences in a variety of databases. Because it searches alignment as opposed to sequence databases, and scores based on both sequence similarity and secondary structure similarity, HHPred is a far more sensitive homolog detection algorithm than BLAST. For example, HHPred was recently used to demonstrate that HJURP and Scm3 shared common ancestry (Sanchez-Pulido et al. 2009).
An HHPred of full-length K. lactis Scm3 did not yield any Drosophila sequences, but, predictably, included HJURP (residues 14–55), which matched residues 58–99 in K. lactis Scm3 (E value of 0.18). An alignment spanning a region of 36 residues in length, including the region in D. melanogaster CAL1 that matched K. lactis Scm3 from a subset of CAL1 sequences (D. melanogaster, D. mojavensis, D. grimshawi, D. persimilis, D. erecta, D. ananassae, D. yakuba, and D. willingstoni), was run in HHPred to determine if including a broader phylogenetic representation of Drosophilid CAL1 sequences improved the probability of similarity. This analysis identified the K. lactis Scm3 sequence (residues 76–97) as the top hit (E value of 0.55), and the S. cerevisiae Scm3 (residues 106–126) from a Cse4 + Scm3 + H4 single-chain fusion database entry (E value of 4.8) as another significant hit. Thus, the inclusion of more distant Drosophila species increased the significance of these HHPred searches in identifying similarity between CAL1 and Scm3.
The profile of the fungal Scm3 protein alignment was then compared with that of Drosophila CAL1, and vice versa, specifically for these 36 residues using HHPred. In each comparison, sequence similarity between these two families was statistically significant (E values of 5.1 × 10−5 and 6.3 × 10−5, respectively; Fig. 2b). A similar analysis of the profiles of fungal Scm3 and metazoan HJURP alignments in HHPred yielded highly significant similarity (E value of <10−5) further supporting the previously identified common ancestry between HJURP and Scm3 (Sanchez-Pulido et al. 2009). Interestingly, PSIPRED, a secondary structure prediction algorithm, predicted the presence of an α-helix in this region of CAL1; similarly, the corresponding region in K. lactis Scm3 forms a slightly longer α-helix (residues 44–103) in the region that makes extensive contacts with both Cse4 and histone H4 (Sanchez-Pulido et al. 2009; Cho and Harrison 2011). Thus the similarity, as far as this region of CAL1 is concerned, extends beyond the sequence to the predicted protein structure.
Despite the apparent similarity of this CAL1 region to the Scm3 domain, our analyses do not support the existence of common ancestry between CAL1 and Scm3, consistent with a previous study which failed to identify Scm3 homologs in Drosophila (Sanchez-Pulido et al. 2009). CAL1 is a much larger protein than either HJURP or K. lactis Scm3 and the sequence similarity appears limited to a region within the Scm3 domain, with no discernible conservation across other parts of the protein sequence. However, given the functional similarities between CAL1 and Scm3/HJURP in the process of CENP-A loading, and the fact that the N terminus of CAL1 interacts with CID (Schittenhelm et al. 2010), we speculate that the CAL1 N terminus may have independently evolved a region that resembles part of the Scm3 domain of Scm3 to carry out CENP-A binding through convergence. Future studies aimed at elucidating the structure of the CAL1 N terminus in complex with CID, as well as studies addressing the ability of CAL1 to recruit CID, are needed to determine whether CAL1 and HJURP/Scm3 are functionally and, at least in part given the findings above, structurally analogous proteins.
Is CAL1 evolving under positive selection?
Previous studies have shown that CENP-A is evolving adaptively in Drosophila, Arabidopsis, and primates (Malik and Henikoff 2001; Talbert et al. 2002; Schueler et al. 2010). It has been proposed that adaptive evolution of centromere-binding proteins occurs to counterbalance the harmful effects of centromere drive, an expansion of centromeric satellites that leads to the preferential segregation of the “stronger” centromere into the oocyte of organisms with asymmetric female meiosis (Henikoff et al. 2001; Malik and Henikoff 2002). The difficulty in identifying orthologs for CAL1, coupled with the adaptive evolution of CID and the functional association between these two proteins, raises the question of whether the evolutionary history of CAL1 may also be under positive selection.
Evolutionary forces acting on proteins can be measured by comparing the rates of nonsynonymous (dN) and synonymous (dS) nucleotide substitutions between coding sequences from closely related species. The dN and dS are expected to be equal for proteins under neutral selection (dN/dS or ω = 1). Negative or purifying selection results in dN/dS of <1 and indicates that changes in the protein are deleterious and thus eliminated from the population. Positive selection (dN > dS) can indicate that a sequence is evolving adaptively, as is often observed as a consequence of genetic conflict, such as that existing between virus and host in which co-evolving host–parasite systems are perpetually changing in an unceasing arms race (Valen 1973).
Analysis of full-length CAL1 using the M0 model in PAML (Yang 2007) resulted in a global dN/dS < 1 (0.18400), which suggests that the gene overall is not subject to positive selection. Analysis of the N terminus of CAL1 resulted in the lowest dN/dS ratio (dN/dS = 0.094), compared with the C terminus (dN/dS = 0.182) and the middle region (dN/dS = 0.625), suggesting these individual regions are also not under positive selection. It is worth noting that the middle region of CAL1 cannot be aligned correctly due to the presence of many gaps and high sequence variability, thus, the higher dN/dS ratio for this region is most likely due to alignment artifacts.
LRT of positive selection over the full sequence, N terminus, C terminus, and middle region of CAL1
2Δl (M7 vs. M8)
Percentage of sites
LRT for branches under positive selection
2Δl (H0 vs. H1)
Residues under positive selection identified under the branch-site model LRT
Given the lack of detectable diversifying selection within CAL1, we sought to determine whether the conserved N and C termini are evolving under neutral or purifying selection (Pond and Frost 2005). A total of 123 sites in the N terminus (~30 % of N-terminal sites) and 50 sites in the C terminus (~20 % of C-terminal sites) were found to be under purifying selection with significant p values, while only one such site was identified in the middle region (codon 696; see Supplemental Table 1). Several residues from within the Scm3 domain-like region identified by HHPred were found to be under purifying selection, consistent with high conservation in this region (Fig. 2a). Collectively, our analyses show that CAL1 is not evolving adaptively and that, in fact, several sites within the N and C termini of CAL1 are under purifying selection.
Thus, it appears that CAL1 does not participate in the meiotic drive observed for CID in this species group. We speculate that, while CID is rapidly evolving at sites where it makes contact with the DNA (Henikoff et al. 2001; Malik and Henikoff 2001; Malik et al. 2002), the interface between CAL1 and CID remains unchanged to ensure robust CID-CAL1 interaction and reliable CID recruitment at centromeres.
Could CAL1 have replaced an ancestral Scm3-like chaperone in flies?
The presence of HJURP/Scm3 proteins in such divergent species as fungi and vertebrates, and their absence in plants, insects, and nematodes, is an elusive puzzle of centromere biology. The absence of universal CENP-A chaperones could be indicative of the existence of alternative molecular mechanisms mediating CENP-A assembly in diverse species. On the other hand, the observed lack of homologs could be misleading, and in fact evolutionarily distinct chaperones could use the same molecular strategies of HJURP/Scm3 to recognize and assemble CENP-A.
The discovery of HJURP/Scm3 and of its mechanisms of action has provided much sought-after insights into the pathways mediating accurate CENP-A recognition and assembly. If CAL1 indeed functions as a CID chaperone in Diptera, it will be interesting to investigate what advantages this protein offered over HJURP/Scm3 in this lineage, and whether worm and plant-specific chaperones with distinct homology signatures exist. Alternatively, CAL1 (and equivalent CENP-A binding factors specific to worms and plants) could function as an adaptor bringing together soluble CID and a general histone-assembly factor thereby mediating CID assembly indirectly. While this glimpse into the evolutionary history of CAL1 has afforded model predictions for refining the role of CAL1 in centromere assembly, future cell biological and structural studies will be needed to test this and other models.
We gratefully acknowledge Seth Kasowitz, Asav Dharia, Paul Talbert, Peter Gogarten, and Craig Nelson for help with the evolutionary analyses of CAL1 and Karolin Luger for suggesting HHPred. This work was funded by the National Science Foundation award number 1024973 to BGM.
- Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. Nucleic Acids Res 38:W7–W13Google Scholar
- Pond SL, Frost SD (2005) Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics21:2531–2533Google Scholar
- Price MN, Dehal PS, Arkin AP (2010) FastTree 2--approximately maximum-likelihood trees for large alignments. PLoS One 5:e9490Google Scholar
- Schittenhelm RB, Althoff F, Heidmann S, Lehner CF (2010) Detrimental incorporation of excess Cenp-A/CID and Cenp-C into Drosophila centromeres is prevented by limiting amounts of the bridging factor Cal1. J Cell Sci 123:3768–3779Google Scholar
- Valen V (1973) A new evolutionary law. Evol Theory 1:1–30Google Scholar
- Felsenstein, J (2005) PHYLIP version 3.6 (distributed by the author). Department of Genome Sciences, University of Washington, SeattleGoogle Scholar