Gene organization and evolutionary history

Heterochromatin protein 1 (HP1) was originally discovered through studies in Drosophila of the mosaic gene silencing that results when a euchromatic gene is placed near or within heterochromatin, the condensed state of chromatin that is a cytologically visible condition of heritable gene repression [1, 2]. This phenomenon is known as position-effect variegation (PEV), and HP1 is a dominant suppressor of it. The HP1 family of non-histone chromosomal proteins are involved in the establishment and maintenance of higher-order chromatin structures. Members of this evolutionarily conserved family have been discovered in almost all eukaryotic organisms, from fission yeast to plants to humans (Figure 1). An HP1 protein has not been observed in budding yeast (Saccharomyces cerevisiae), in which PEV is generated by the silent information regulatory (SIR) proteins [3]. The fission yeast (Schizosaccharomyces pombe) and Neurospora genomes each contain one HP1 homolog, Dictyostelium has two, and different animal species have up to five. Over the length of the protein, there is 50% amino-acid sequence identity between mammalian HP1 proteins and Drosophila HP1 [4].

Figure 1
figure 1

A phylogenetic tree of HP1 proteins. Species shown are Caenorhabditis elegans (Ce), Drosophila melanogaster (Dm), Drosophila virilis (Dv), Dictyostelium discoideum (Dd), Gallus gallus (Gg), Homo sapiens (Hs), Mus musculus (Mm), Neurospora crassa (Nc), Schizosaccharomyces pombe (Sp) and Xenopus laevis (Xl). Most animal species have several HP1 isoforms, but those in Drosophila and C. elegans are not generally orthologous with particular mammalian isoforms. The tree was adapted from the combined data from the Wellcome Trust Sanger Institute Pfam protein family database [76] and Simple Modular Architecture Research Tool (SMART) database [77].

The HP1 family of proteins is encoded by a class of genes known as the chromobox (CBX) genes. There are three distinct proteins in the mammalian HP1 family, each of which is encoded by its own gene. In humans, HP1α is encoded by the Chromobox homolog 5 (CBX5) gene located on chromosome 12q13.13 [5]. The genes for HP1β (CBX1) and HP1γ (CBX3) are located on chromosomes 17q21.32 and 7p15.2, respectively. The murine Cbx5, Cbx1 and Cbx3 genes are located within syntenic regions of the mouse genome to the orthologous human genes: 15qF3, 11qD and 6qB3, respectively [6]. This conserved synteny shows that HP1 proteins have evolved under stringent evolutionary pressures, indicating that their function has been carefully selected. CBX5, CBX1 and CBX3 encode proteins with distinct localization patterns, however, despite being approximately 65% identical [7].

Interestingly, the genomic structure of HP1-encoding genes is conserved from Drosophila to humans. The gene encoding Drosophila HP1, known as Su(var)2-5, along with the genes encoding mouse HP1s (Cbx5, Cbx1 and Cbx3) and human HP1s (CBX5, CBX1 and CBX3), each comprise five exons separated by four introns [5, 8] (Figure 2a). The translational start site is conserved within exon 2, but because of an extra intron within exon 1 of murine Cbx3, its translational start site is in exon 3 [8]. Except for murine Cbx3, the sequence encoding the chromodomain is in exons 2 and 3. Exons 3 and 4 of murine Cbx3 have fused into exon 4, so its chromodomain is encoded within exons 3 and 4. The chromoshadow domain is encoded in exons 4 and 5 for all members of the family [8]. Although the splice-site sequences are conserved across the mammalian HP1 family, the splice sites in Drosophila are distinct, suggesting that the genomic structure has been conserved without maintaining intron-exon boundaries.

Figure 2
figure 2

Structure of HP1 proteins and the genes encoding them. (a) The conserved genomic structure of HP1-encoding genes from Drosophila to humans. Each gene is made up of five exons separated by four introns. The start (ATG) and stop codons are indicated. The exons encoding the chromodomain and the chromoshadow domain are indicated by brackets and arrows. Asterisks mark where murine Cbx3 (encoding HP1γ) differs from the arrangement shown: the start codon is in exon 3 and the chromodomain is encoded by exons 3 and 4 of this gene. (b) The conserved linear structure of HP1 proteins. N, amino terminus; C, carboxy terminus. (c) The overall three-dimensional structures of the chromodomain and chromoshadow domain of murine HP1β. Coordinates were downloaded from the Protein Data Bank (PDB) structural database and modeled using the Insight II program from Accelrys [78].

In addition to the three main HP1-coding genes in vertebrates, numerous HP1 pseudogenes have been discovered [5, 8, 9]. For example, in humans there is one CBX5 pseudogene, at least five CBX1 pseudogenes and eleven CBX3 pseudogenes. The scattering of pseudogenes throughout the genome suggests that HP1-like sequences have been duplicated multiple times during evolution.

The HP1 family is part of a larger superfamily of proteins containing chromatin organization modifier (chromo)domains. The chromodomain is an evolutionarily conserved region in the amino-terminal half of HP1 proteins, of approximately 30-60 amino acids [10]. All proteins containing this domain can characteristically alter the structure of chromatin to make heterochromatin. The chromodomain of HP1 shares greater than 60% amino-acid sequence identity with the chromodomain found in Polycomb, a silencer of homeotic genes [11]. Substituting the chromodomains of Polycomb and HP1 for each other changes their nuclear localization patterns accordingly, thus implicating the chromodomain in both target-site binding and target preference [12]. Sequences encoding chromodomain-containing proteins have been discovered in the genomes of animals and plants, suggesting that the chromodomain has a highly conserved structural role.

The HP1 proteins form their own family within the chromodomain superfamily, characterized by the presence of a second unique conserved domain in the carboxy-terminal half of the protein, known as the chromoshadow domain [13]. This domain shares amino-acid sequence identity with the chromodomain, but it has different functions (see below). The high level of similarity between the two types of domain suggests, however, that HP1-encoding genes could have arisen from a duplication of one of these domain sequences. Through evolution, one domain, more likely the chromoshadow domain, then diverged enough to facilitate distinct functions.

Although there are relatively few members of the HP1 family, considering their evolutionary longevity, their functional importance in evolution is clear. In cross-species experiments, the chromodomain from mouse HP1β can functionally replace the chromodomain of S. pombe HP1 [14], and expression of human HP1α can rescue the lethality of homozygous mutants in the Drosophila HP1-encoding gene Su(var)2-5 [5]. This high degree of conservation within two regions, the chromodomain in the amino-terminal half and the chromoshadow domain in the carboxy-terminal half, suggests that these domains are at the core of HP1 function and of the interaction of HP1 proteins with other molecules in the formation of condensed chromatin structure.

Characteristic structural features

The chromodomain superfamily, which contains the HP1 family, can be subdivided into three major classes on the basis of domain organization [13]. One class, characterized by the presence of a single chromodomain, includes Polycomb and mammalian modifier 3. A second class is identified by paired tandem chromodomains, as found in DNA-binding/helicase proteins, such as yeast CHD1 and mammalian CHD-1 to CHD-4. The third class consists of proteins containing both a chromodomain and the highly related chromoshadow domain, which includes all members of the HP1 family.

The sequence and structure of HP1 proteins can be divided into three regions (Figure 2b). First, the chromodomain is a module at the amino terminus that is responsible for HP1 binding to di- and trimethylated lysine 9 (K9 in the single-letter amino-acid code) of histone H3; these methyl groups are epigenetic marks for gene silencing [15, 16]. Second, the carboxy-terminal chromoshadow domain is involved in homo- and/or heterodimerization and interaction with other proteins. Third, the chromodomain is separated from the chromoshadow domain by a variable linker or hinge region containing a nuclear localization sequence. Each of these three segments will be discussed in detail from a structural perspective.

The chromodomain

The structure of the amino-terminal chromodomain alone has been analyzed by nuclear magnetic resonance spectroscopy [17]. The domain folds into a globular conformation approximately 30 Å in diameter, consisting of an antiparallel three-stranded β sheet packed against an α helix in the carboxy-terminal segment of the domain [17] (Figure 2c). A hydrophobic groove is formed on one side of the β sheet, which is composed of conserved nonpolar residues. Interestingly, comparison of this structure with the databases reveals a similar structure in two archaeal histone-like proteins, Sac7d and Sso7d [17]. This structure in Sac7d binds to the major groove of DNA in a nonspecific manner as a result of the net positive charge on the exterior of the β sheet. Unlike these archaeal DNA-binding proteins, however, in HP1 the β sheet has an overall negative charge, implicating the chromodomain as a protein-interaction motif rather than a DNA-binding motif.

The gene-silencing function of HP1 depends on an interaction between the chromodomain and the methyl K9 histone H3 mark [12, 18]. The hydrophobic pocket of the chromodomain provides the appropriate environment for docking onto this methylated residue. The bound segment of the H3 tail adopts a β-strand conformation, lying coplanar to and antiparallel with two β strands of the chromodomain, which completes a three-stranded β sheet [19, 20]. In addition, the methylammonium group in K9 is effectively caged by three aromatic side chains, whereas the surrounding residues of K9 contact specific sites within the chromodomain. This positioning makes sense of the functional defects and loss of methyl K9 binding upon mutation of key hydrophobic amino acids located in the amino-terminal part of Drosophila HP1 (Tyr24, Val26, Trp45 and Tyr48) [20]. Interestingly, no other combinations of naturally occurring amino acids have been found that interact with the chromodomain, indicating that the methylated histone mark is the sole binding partner for this domain [21].

Methylation occurs on other lysines within histone H3, as well as the other histones. In fact, methylation on K27 of H3 occurs within a highly similar amino-acid sequence context as K9 - ARKS. This mark on K27 serves as a binding site for the Polycomb chromodomain [22]. The discrimination between these two highly related repressive marks has been examined [23]. The chromodomains of HP1 and Polycomb are structured similarly, but their peptide-binding grooves show distinct features that provide this discrimination. The main differences lie in the extent of protein-peptide interactions - Polycomb interacts with a larger number of the peptide residues surrounding the methyl lysine - and in context recognition, as HP1 finely discriminates the peptide residues in the immediate vicinity. Therefore, although the posttranslational mark, the surrounding histone sequence and the overall chromodomain structure are strikingly similar between them, the mode in which Polycomb and HP1 bind histone H3 and make essential interacting contacts are different.

The chromoshadow domain

The overall structure of the chromoshadow domain is very similar to that of the chromodomain, with a globular conformation of approximately the same size [24] (Figure 2c). Like the chromodomain, the chromoshadow domain is composed of three β strands to complete an antiparallel sheet. Unlike the chromodomain, which has a subsequent single α helix that folds against the sheet, the chromodomain has two carboxy-terminal α helices.

Although the chromodomain remains monomeric in solution, the chromoshadow domain readily dimerizes under the same conditions [25]. The dimer interface involves a symmetrical interaction on helix α2, which lies at an angle of 35° to helix α2 of the other HP1 molecule [24]. Conserved residues that are unique to the chromoshadow domain are located at the dimer interface. As a result, this dimer structure creates a nonpolar groove that can accommodate HP1-interacting proteins containing the consensus sequence PXVXL [24] (see below).

The linker region

The two highly conserved chromo- and chromoshadow domains are separated by a less conserved linker or hinge region. This region contains the most variable amino-acid sequence between HP1 proteins, between proteins both from the same species and from different species. The structure of the linker region has been proposed to be flexible and exposed to the surface [26]. The variable nature of this region has been resulted in some difficulty in capturing its three-dimensional structure with a variety of methods.

The linker is highly amenable to posttranslational modifications, especially phosphorylation [2730]. In addition, modifications within this region have been shown to affect localization, interactions and function. The linker could therefore be a central control region in the regulation of HP1 proteins.

Localization and function

As its name suggests, the localization as well as the roles of HP1 proteins in heterochromatic regions have been well studied. More recent studies have made it increasingly clear, however, that HP1 proteins localize not only to heterochromatic regions but also to euchromatic regions [27, 3133]. This localization appears to be isoform-specific: in mammalian cells, HP1α and HP1β are mainly heterochromatic, whereas HP1γ is observed in both heterochromatin and euchromatin [32]. Recently, our laboratory has shown that each HP1 isoform is regulated by posttranslational modifications, such as acetylation, phosphorylation by multiple kinases, methylation, ubiquitination and sumoylation, in a similar way to histones [27]. Interestingly, modification of a specific residue, Ser83 of HP1γ, defines a subpopulation of this isoform that is exclusive to euchromatin [27]. It can therefore be extrapolated that the subnuclear localization of HP1 proteins is determined not only by their interactions with other proteins, but also by a combination of protein interactions with particular posttranslational modifications.

Repetitive DNA elements are found at centromeres and telomeres and are enriched with HP1 [34]. HP1 proteins have been localized to the nuclear periphery, and this may be associated with their interaction with the lamin B receptor and/or with the localization of centromeric heterochromatin [35, 36]. In addition to the DNA repeats present in centromeres and telomeres, repetitive DNA sequences that are spread throughout euchromatin can also be associated with heterochromatin formation. HP1 has also been shown to be a mediator of more refined silencing at single-copy genes in euchromatic regions [3739]. In Drosophila, HP1 has recently been shown to co-localize with transcriptionally active domains of polytene chromosomes and, in both mouse and human, HP1 proteins, in particular HP1γ, have been associated with transcriptional elongation [27, 40]. Thus, despite its name and its predominant localization at heterochromatin, HP1 seems to have different roles in different nuclear environments.

The most common of HP1 functions is the formation of heterochromatin. One model of heterochromatin formation involves a circular recruitment based on binding to methyl K9 histone H3. HP1 is recruited to the methylated K9 mark through the histone K9 methyltransferase SUV39H1 [16, 41]. In turn, HP1 recruits more SUV39H1, which propagates the methyl K9 mark to spread along a locus, with subsequent recruitment of additional HP1 molecules. This model has been also extended to DNA methylation, as both HP1 and SUV39H1 recruit DNA methyltransferases [42]. It is noteworthy that, in some cases, histone H3 K9 methylation precedes DNA methylation [4348], supporting the notion these molecules participate in a recruitment loop for gene silencing.

In addition to binding methylated K9 of histone H3, HP1 has been observed to interact directly or indirectly with several non-histone proteins with a wide variety of functions. These partners are involved in cellular processes ranging from transcriptional regulation, chromatin modification and replication to DNA repair, nuclear architecture and chromosomal maintenance (Table 1). Interestingly, these interactions can occur in either a manner specific to one HP1 isoform or universally with all three isoforms, and they can also depend on particular posttranslational modifications of HP1 [27]. For example, Ku70, a protein involved in repair of DNA double-strand breaks, appears to interact with HP1γ only upon phosphorylation of Ser83 of HP1γ, whereas HP1α interacts with Ku70 under native conditions [27, 49]. One mechanism of chromoshadow domain binding is through a PXVXL motif present in various other proteins, which is sufficient for interaction with dimerized chromoshadow domains [21]. Interaction occurs through binding of the peptide across the HP1 dimer interface, so that it forms a parallel β sheet with the carboxy-terminal tail of one monomer and an antiparallel β sheet with the tail of the other monomer [50]. Targeting of HP1 to heterochromatin has been shown to require this interaction with PXVXL-containing proteins in addition to the necessity of methyl K9 histone H3 recognition [50].

Table 1 Examples of HP-1 interacting partners

The chromoshadow domain is important for both the homo- and the heterodimerization properties of HP1 as well as its interaction with other molecules. HP1 molecules readily dimerize with each other through their chromoshadow domains [24, 35, 51]. There appear to be differences in preferences for dimerization between particular isoforms, although this may vary with conditions such as phosphorylation status. Dimerization between HP1 molecules has been shown to occur between the carboxy-terminal α helices of each monomer. The dimer interface involves contact with key residues Ile161, Tyr164, Leu168 of mouse HP1β or the equivalent residues in other proteins [25]. These residues are conserved in all mouse and human HP1 isoforms, as well as in Drosophila HP1.

The importance of HP1 in normal development is suggested by the phenotype of the homozygous mutation of the gene encoding HP1 in Drosophila, Su(var)2-5: lethality at the third instar larval stage [52]. This developmental stage coincides with the time that the maternal supply of HP1 proteins normally becomes reduced.

The RNA interference (RNAi) machinery has also been found to be essential for the establishment and maintenance of heterochromatin domains. Loss of or mutations in components of RNAi machinery in S. pombe, Drosophila and mouse result in abnormal localization of HP1 [5355]. In one report, production of small interfering (si)RNA is not affected in the absence of HP1 [56], (since retracted) suggesting indicating that HP1 is not involved in the initiation of RNAi but rather functions downstream of the RNAi pathway.


HP1 proteins have been a subject of active investigation for over a decade. Today, a significant amount of information is known abut the structural and the basic biochemical properties of these proteins. Many questions remain to be addressed, however. The diversity of binding partners combined with the isoform specificity of binding implicates HP1 proteins in many nuclear processes. With the high degree of similarity between the three isoforms, the factors that influence these differences remain unknown. Despite the identification of so many HP1 binding partners, the signaling cascades that mediate interaction with these proteins in order to ultimately 'switch on' or 'switch off' gene silencing also remain poorly defined. Thus, it is essential to define these pathways if we are to map useful networks of membrane-to-chromatin signaling cascades and understand better the regulation of both activation and repression. With each HP1 isoform further regulated by posttranslational modifications similar to those that make the histone code possible, we are seeing the emergence of a new paradigm that includes an HP1-mediated subcode in conjunction with the histone code. This is a significant step forward for this field of research and means that the possible combinations become endless. We anticipate that HP1 will continue to be an active field of research and that future studies in this field will be exciting and illuminating, not only for this protein family, but in the larger context of chromatin dynamics.