Gene organization and evolutionary history

Gene structure and chromosomal localization

Salmonid fish can have as many as 15 closely related protamine genes per haploid genome, coding for as many as six different proteins [1]. Birds carry two virtually identical copies of the same gene per haploid genome [2], and only a single copy each of the genes for protamines P1 and P2 have been detected in mammals [3]. Even though it is likely that the protamine P2 gene derives from a duplication of the protamine P1 gene, the two proteins appear to be rapidly diverging in amino-acid sequence.

The mammalian P1 and P2 genes contain a single intron (Figure 1), whereas the protamine genes from birds (chicken and quail) and salmonid fish are intronless. Detailed alignments of the cis-acting regulatory sequences have identified the presence of several consensus sequences. These include conserved cAMP-response elements, the TATA box, a CAP site, and a polyadenylation signal [4]. The two chicken protamine genes are clustered together within 6 kb of each other. The genes for human P1 and P2 are similarly co-located in a tight cluster on chromosome 16 at 16p13.2 [5]; this cluster also contains the gene for transition protein-2, which is also involved in chromosome condensation. A similarly arranged protamine cluster is found on chromosome 16 in the mouse [3]. In human, mouse, rat and bull the protamine cluster also contains an open reading frame that has been referred to as 'gene 4' [6] or 'protamine 3' [7]. The predicted aminoacid sequence for this protein, which would be approximately the same size as protamine P2, contains stretches of repeating glutamic and aspartic acid residues similar in number and distribution to the clusters of arginine and lysine residues found in the DNA-binding domains of protamines. This difference in composition (a high content of negatively charged amino acids compared with the high content of positively charged amino acids in protamines) suggests that the gene 4 protein, which is not likely to bind to and condense DNA, may instead bind to and interact with the protamines and perform some other function related to chromatin repackaging.

Figure 1
figure 1

Primary structures of mouse protamine genes and proteins. Schematic representation of the mouse (a) mP1 and (b) mP2 proteins. Numbers denote amino acid residues; the two exons in each protein are shown on bars below the proteins; important residues are indicated as shown in the key (using letters in the single-letter amino acid code, with a subscript number indicating the residue number). And, proposed DNA anchoring domains (regions containing 3 or more consecutive arginine and lysine residues). Proposed phosphorylation sites are amino acid residues identified to be phosphorylated in proteins isolated from sperm or following in vitro incubation of the isolated proteins of five mammalian species with cAMP-dependent protein kinase or protein kinase C.

Whereas the protamine P1 gene (PRM1) appears to be transcribed and translated in the spermatids of all mammals [8], the protamine P2 gene (PRM2) is translationally regulated in a species-specific manner. PCR amplification techniques have confirmed the presence of PRM1 in a wide variety of eutherian mammals, but attempts to amplify PRM2 and hybridizations of PRM2 cDNA probes to genomic DNA have revealed that PRM2 gene sequences exhibit considerable divergence and may be less widely distributed phylo-genetically [911]. All primates, and most rodents examined so far, produce sperm that contain both P1 and P2 protamines [12, 13], showing that the P1 and P2 genes are both transcribed and translated. The sperm of perissodactyls (horse, zebra and tapir), lagomorphs (rabbit and hare) and proboscids (elephants) have also been found to contain processed protamine P2 and to use P2, in combination with P1, to package their sperm genomes. The sperm of most other species appear to contain only P1. In some of these species (for example, bull and boar), the gene for protamine P2 is present, but it seems to be dysfunctional or produce an aberrant protein.

Evolution of protamines

Several excellent reviews describe the basic nuclear proteins that package DNA in the sperm of plants [14] and animals [1520]. During sperm development in animals, the histones that package DNA in early spermatids are removed from the DNA and replaced in the final stages of spermatid maturation by one of three types of proteins: sperm-specific histones, protamine-like proteins or protamines. In mammals, protamines do not replace the 'somatic' histones directly; instead, the differentiating spermatids synthesize a group of so-called transition proteins that bind to the spermatid DNA in advance of the protamines. Comparative analyses of the three families of proteins listed above suggest that the process of preparing the sperm's genome for fertilization probably evolved from using specialized (sperm-specific) histones to protamine-like proteins to protamines. This is not to say, however, that the protamines of all animals are closely related or structurally similar. Comparisons of the amino-acid sequences of vertebrate and invertebrate protamines show that the protamines from all animals do not constitute a true family, and that the sequence, structure, and possibly function of protamines are evolving independently in vertebrates and various invertebrate groups (mollusks, cephalopods and tunicates).

Sperm-specific histones, which have been identified in sperm from a wide range of species (from echinoderms to primates), are amino-acid sequence variants of somatic histones. Among the best studied sperm-specific histones are those from echinoderms (sea urchins), agnathans (lamprey and hagfish) and sponges. In echinoderms, sperm contain both somatic-type histones and specialized Sp H1 and Sp H2B histone variants, which are believed to participate in compacting the chromatin. These specialized histones are synthesized just before meiosis, and they partially (in echinoderms) or completely (in tunicates) replace their somatic counterparts.

Protamine-like proteins have been found in the sperm of many species, ranging from sponges to amphibians, and comprise the most heterogeneous group of sperm basic nuclear proteins. These proteins have a higher lysine and arginine content (35-50% Arg and Lys) than histones, and they are considerably larger (generally containing 100 to more than 200 amino acids) than the proteins designated as true protamines. The few protamine-like proteins that have been analyzed in detail have structural features in common with both histones and protamines. Like histone H1, these proteins have a protease-resistant globular core and unstructured carboxy- and amino-terminal domains enriched in basic amino acids [21]. Post-translational cleavage of the protamine-like proteins in mussel, cuttlefish and razor clam yield small arginine- and lysine-rich fragments similar to protamines. Sequence comparisons of the cleavage-generated mussel protamine-like fragments and mammalian protamine P1 show that the mussel fragments exhibit significant similarity (around 50%) to mammalian protamines. They differ, however, in that the majority of the positively charged residues in the mussel sequence are lysine, rather than arginine. One explanation for this difference comes from sequence analyses of the sperm-specific H1 histones and the protamines of sea squirts [22]. An analysis of codon usage in the Ciona and Styela protamine-like proteins showed that the observed evolution could not be derived by point mutations, but that a frameshift mutation in the carboxy-terminal end of the lysine-rich sperm-specific H1 histone could lead to the arginine-rich sequence observed in Styela protamine. Together, these observations provide compelling support for the hypothesis that the protamines evolved from H1-like histones [2225].

The true protamines are typically short proteins (50-110 amino acids) that can contain up to 70% arginine. Gene and protein sequences (see Additional data file 1) have been determined for protamines from more than 100 vertebrate species, and the true protamines are the best characterized of the sperm basic nuclear proteins. Two structural elements have been identified in all vertebrate protamines. One is a series of small 'anchoring' domains containing multiple arginine or lysine amino acids (three or more per domain, highlighted in red in the figures in Additional data file 1) that are used to bind the protein to DNA. The second is the presence of multiple serine and threonine residues that can be used as phosphorylation sites. The protamines of insects, birds, teleost fish, reptiles and most marsupials lack cysteine, whereas those present in eutherian mammals all contain multiple cysteine residues that are oxidized to form disulfide bridges that link the protamines together and stabilize the chromatin complex during the final stages of sperm maturation.

Two groups have independently reported evidence that positive selection for the maintenance of a large number of arginine residues (rather than selection at any particular position) is being applied to protamine P1 in many species of mammals [26, 27]. The driving forces for this selection are not known, but among those proposed, the more likely are those that influence the stability of the sperm chromatin complex. Protamines with a higher arginine content form more stable complexes with DNA and are more efficient at displacing histones and transition proteins from DNA. The abundance of arginine has also been suggested to be important for the subsequent remodeling of the sperm chromatin complex following fertilization [28].

Characteristic structural features

Most of the structural information obtained for protamines and DNA-protamine complexes has been derived from protamines P1 and P2 of placental mammals, and from the fish protamines salmine and clupine.

Protamine P1 and P1-like fish protamines

The P1 protamines of placental mammals are typically 49 or 50 amino acids long and contain three domains: a central arginine-rich DNA-binding domain flanked on both sides by short peptide segments containing cysteine residues. The protamines of monotremes and most marsupials have sequences similar to those of the placental mammals, except that they lack cysteine residues. One genus of shrew-like dasyurid marsupials, the Planigales, is an interesting exception to this generalization, as they have gained five or six cysteine residues in their P1 protamines since their divergence from the other dasyurids [29]. In most species, the central DNA-binding domain typically consists of a series of anchoring sequences containing 3-11 consecutive arginine residues, which bind the protein to DNA. This domain is similar in size and composition to the entire sequence of many fish protamines [3033]. Sequence comparisons of the fish protamines with mammalian P1 protamines show that the arginine-rich regions containing the anchoring domains are conserved (around 60-80% sequence identity), but the remainder of the protein sequence exhibits considerable variation.

Structural studies of the protamines and their complexes with DNA have been limited primarily to bull P1 and salmine. Raman spectroscopy has shown that the free protamine is unstructured in solution [34]. Upon binding to DNA, P1 wraps around the DNA helix in the major groove (Figure 2a) [35], with one protamine molecule being bound per turn of DNA helix [36]. Although not all the structural details of the DNA-protamine complex have been resolved (a crystallographic or nuclear magnetic resonance structure of a DNA-protamine complex has not yet been obtained), the predominant interactions that contribute to the remarkable stability of the complex are the combination of hydrogen bonds and electrostatic bonds that form between the guanidinium groups of each arginine residue in the anchoring domains of the protamine and the phosphate groups in both DNA strands.

Figure 2
figure 2

Protamine molecules bind in the major groove of DNA, neutralizing thephosphodiester backbone of DNA and causing the DNA molecules to coil into toroidal structures. (a) Model showing how two adjacent salmon protamine molecules (blue atoms) wrap around the DNA helix (white atoms) and bind within the major groove of DNA. (b) Scanning-probe images of toroidal DNA-protamine complexes prepared in vitro on a graphite surface by adding protamine to DNA attached loosely to the surface. The toroids formed in vitro are similar in size and shape to those isolated from human sperm chromatin (c). (c) Scanning-probe microscope images of native DNA-protamine toroids obtained from human sperm chromatin. These toroids, which comprise the basic subunit structure of protamine-bound DNA, contain approximately 50,000 bp of DNA coiled into each donut-shaped structure.

In contrast to the coiling of DNA by histones into nucleosomes and the higher-order arrangements of nucleo-somes in somatic chromatin, the binding of salmine or protamine P1 induces the coiling and condensation of DNA into much larger toroidal chromatin subunits [35, 37]. These toroidal subunits (Figure 2b,c), which have been observed in native sperm chromatin [35] and have also been induced in vitro by protamine binding to DNA [38], are approximately 50-70 nm in diameter, 25 nm thick and have been estimated to contain approximately 50,000 bp of closely packed, coiled DNA [39].

Protamine P2

While protamine P1 and P2 seem to have been derived from a common ancestral precursor, P2 has several features that distinguish it from P1. At the sequence level, P2 protamines from different species exhibit the same variation as observed in P1 protamines (over 60% sequence identity among P2 molecules; 50-70% sequence identity between P2 and P1 molecules). However, the gene PRM2 codes for a precursor protein that has been shown to bind to DNA and then undergo proteolytic processing. The processing event, which has been examined in detail for mouse P2, occurs over a period of several days in late-step spermatids and results in the production of six partially processed forms of the precursor [40, 41]. When processing is complete, approximately 40% of the amino terminus of the molecule has been removed. The fully processed form of the P2 precursor, protamine P2, is slightly larger than P1 (63 amino acids in mouse) and is the predominant form of P2 in the mature sperm head.

Unlike in rodents and most other species of mammals, two differently processed forms of protamine P2 - P2 and P3 - are bound to DNA in human, ape and Old World monkey sperm [12, 42]. Only one processed form is observed in New World monkeys. The two forms of the P2 protein differ only in their three amino-terminal amino acids - P3 is three amino acids shorter (at 54 amino-acid residues) than P2 (57 amino acids) - and they seem to be products of the same PRM2 gene. A third protamine P2 sequence variant has also been detected in human [42] and macaque [12] sperm.

Protamine P2 also differs from P1 in that P2 binds zinc. Physical measurements performed on intact sperm from different species show that the P2 protamines from human, mouse and hamster coordinate one zinc atom per molecule [43]. Different zinc-finger models have been proposed for the zinc coordination site(s) in human P2 [44, 45]. However, none of these models is consistent with the conserved histi-dine and cysteine residues present in the majority of known P2 protamines. These models also require the majority of the protamine P2 sequence to wrap around and coordinate zinc. Such structures would not be expected to bind to the length of DNA sequence that has been estimated to be the P2 protamine footprint [36]. In stallion spermatids, zinc appears to play an integral role in sperm chromatin maturation [46]. A significant fraction of the zinc is lost from sperm chromatin when the cysteine thiols in protamine are oxidized into disulfide bonds. More recent X-ray absorption fine-structure studies of the zinc bound to protamine in intact elongating hamster spermatids and epididymal sperm (C Dolan, K Peariso, M Corzett, J Mazrimas, J Pennerhahn and RB, unpublished observations) suggest that the aminoacid residues involved in the coordination are located near the carboxy-terminal end of protamine P2 and the residues involved in the coordination change when the intra- and inter-protamine disulfide crosslinks form.

Localization and function

Subcellular distribution and tissue expression patterns

The protamines are synthesized in soluble polyribosomes in the cytoplasm of elongating spermatids [47], and they bind to and package all but a very small subset of the sperm genome. One notable exception is human sperm. In humans, and possibly other primates, a significant fraction (10-15%) of the sperm's genome is packaged by histones. These histones, many of which are variants of their somatic histone counterparts, package the DNA into nucleosomes that are more closely packed than in somatic chromatin [48]. This is surprising considering the absence of histone H1, the extensive acetylation of histones H3 and H4, and the phosphorylation of the H2aX histone variant in these nucleosomes. Although most of the genes that retain their histone packaging in human sperm have not yet been identified, they do seem to represent a unique subset of the sperm genome [49, 50]. DNA that has been identified as packaged by histones in mature sperm includes the genes for ε-globin and γ-globin [51] and telomeric DNA [52].

Biochemical analyses [53] and immunohistochemical staining using protamine-specific antibodies [54] of the various stages of spermiogenic cells have shown that the protamines first appear in elongating spermatids, coincident with the initiation of the final stage of chromatin condensation. Transcription and translation of protamine mRNAs has been shown to occur in specific spermatid stages [55, 56], and protamine mRNA has not been detected in Sertoli or interstitial cells or in other tissues [57, 58]. There is also no clear biochemical evidence demonstrating the presence of protamines in other cells or tissues. The UniGene database reports numerous expressed sequence tags (ESTs) for PRM1 and PRM2 in some human non-testis or germ-cell cDNA libraries, including fetal brain, kidney and placenta. At present, however, it is not clear whether these ESTs are artifacts or the result of ectopic expression, or whether they indicate biologically relevant expression of protamine mRNAs or proteins in non-sperm cells.

Post-translational modifications

In mammalian P1 protamines, the DNA-binding domain and the amino-terminal peptide sequence flanking the DNA-binding domain typically contain one or more phosphory-lation sites. These sites, which have been identified in human, stallion and bull P1, seem to be phosphorylated immediately after the protein is synthesized and again following the sperm's entrance into the egg. The unprocessed form of the P2 precursor, as well as various processed forms, are also phosphorylated [59]. The predominant phosphorylation sites involve serine and threonine, although tyrosine residues have also been found to be phosphorylated in rat protamine [60]. The function of protamine phosphorylation has not yet been determined, but it has been proposed that the addition of phosphates to specific serine residues may prevent these regions from interacting with DNA.

The only other known post-translational modifications of protamines are the disulfide bonds that form during the final stages of sperm maturation and epididymal transit in eutherian mammals. After binding of the P1 protamines to DNA, the thiol groups of the cysteines located in the amino-and carboxy-terminal domains of P1 form both intra- and inter-protamine disulfide bonds [37]. These covalent cross-links interlock neighboring protamine molecules together and prevent their removal or dissociation from DNA until the disulfides are reduced after the sperm enters the egg. Protamine P2 is also post-translationally modified through the production of inter-protamine disulfide bonds. Which protamine P2 cysteine residues participate in the formation of the disulfide crosslinks and how many disulfides are formed are not known.

Protamine functions

Several possible functions have been proposed for protamines but only one has been unequivocally demonstrated. The synthesis and deposition of protamine in spermatid chromatin has been shown to correlate temporally with the condensation of the genome of the elongating spermatid and the concomitant termination of transcription [61]. Each protamine P1 molecule binds to 10-11 bp of DNA; protamine P2 binds to a slightly larger segment of DNA (around 15 bp). This binding neutralizes the negative charge along the phosphodiester backbone of DNA and enables adjacent DNA molecules to pack close together. In the sperm of eutherian mammals, the DNA-bound protamines are finally locked into place during epididymal transit by the formation of a network of disulfide bonds. The inactivation of the majority of the spermatid's genes paves the way for the reprogramming of the male genome and the initiation of embryonic development. It also ensures that the male genome does not begin functioning as a testicular cell once it fertilizes the egg.

Although it is generally accepted that sperm chromatin condensation does not play a direct role in the shaping of the sperm head, protamine binding to DNA does result in the production of an uncharged chromatin complex that enables the DNA molecules to be condensed into a volume some 1/20th that of a somatic nucleus. This condensation enables the production of a smaller, more hydrodynamic head, and contributes, albeit indirectly, to head shape. This is consistent with the observation that sperm containing improperly packaged chromatin frequently have enlarged or abnormal head shapes [62].

What little we currently know about the interrelationships between P1 and P2, chromatin organization and male fertility has been obtained from studies of mammalian sperm, primarily from transgenic mice, and from in vitro studies of sperm injected into oocytes. Gene knockout experiments have provided convincing evidence that the presence of both P1 and P2 is required for proper spermatid maturation and male fertility in the mouse [63]. In addition, maintaining the correct proportion of the two protamines in mice has been shown to be critical for maintaining the integrity of the sperm chromatin. Mouse sperm deficient in protamine P2 have increased DNA damage, incomplete chromatin condensation and other defects that block embryonic development beyond the blastocyst stage [64]. Other studies have suggested that incomplete processing of the P2 precursor could also have an impact on sperm function and could contribute to male infertility in both mouse and man [6570].

These and numerous other studies of human sperm that vary in their content of P1 and P2 have begun to provide compelling evidence that alterations in the composition of sperm chromatin and its structural organization (to which P1 and P2 contribute) may affect both fertilization and early events in embryonic development [71]. Intracytoplasmic injection of human sperm that lack the proper amounts of protamines P1 and P2 into human oocytes have revealed that many of these sperm decondense prematurely in the oocyte [72, 73], which results in failed fertilization. Comparisons of the rates of decondensation of sperm from five mammal species with different natural protamine P1/P2 ratios injected into hamster oocytes have suggested one possible explanation - that the protamine P2 content of sperm may regulate the rate at which sperm chromatin decondenses and the male genome is reactivated following fertilization [74]. Sperm containing a higher proportion of P2 (for example, human and hamster) were observed to decondense more quickly in oocytes than sperm containing very little (for example, rat) or no (for example, bull) P2. Because the progression of development beyond the initial fertilization event in mammals requires the sperm cell to complete the decondensation process within a particular period of time after entering the oocyte, and this time varies among species, differences in the P2 content of sperm chromatin may provide a mechanism for 'bar-coding' an incoming genome and identifying it as acceptable or not. This is consistent with the observation that the P2 content and the P1/P2 ratio of sperm chromatin seem to be tightly regulated within a species [75] but vary dramatically between species. It might also explain why human males that produce sperm containing abnormal proportions of P1 and P2 are infertile.

Frontiers

The information that has been obtained about the protamine family of DNA-binding proteins during the past two decades is beginning to have an impact on several very different areas of future research in reproductive biology, evolutionary biology, gene therapy and nanotechnology. An increasing number of biochemical studies of sperm produced by infertile males and transgenic animals have provided evidence that changes in the protamine content of sperm chromatin, incomplete protamine P2 precursor processing, alterations in the P1/P2 ratio or deficiencies in zinc (or replacement of zinc by other metals) may contribute to male infertility. Exposure of males to certain alkylating agents has also been reported to have an adverse impact on reproduction by modifying cysteine residues in protamines and inducing dominant lethal mutations. This information, together with our present knowledge of protamine structure and function and advances in transgene technology, make it possible to test directly how specific changes in protamine structure (for example, the removal of phosphorylation sites, changes in P2 processing sites or removal of functional thiols and zinc-coordinating residues) and sperm chromatin composition impact on protamine function, chromatin packaging, male fertility and the progression of early embryogenesis. We have also learned enough about how protamine functions to create small synthetic DNA-binding proteins or peptides for use in packaging, protecting and aiding the delivery of functional genes to selected cell populations for use in gene therapy [7678], gene silencing [79] or in targeting toxic genes to tumor cells [80]. The self-assembling nature of the protamine-DNA complex may also provide a new approach that can be used to create nanometer- to micrometer-scale self-assembling electrically conductive polymers for use in constructing biocompatible electrical circuits.

Additional data files

Additional data is available online with this article. Additional data file 1 contains a figure showing the amino-acid sequence alignments (using ClustalW [81]) for protamines (a) P1 and (b) P2. The 'anchoring' domains containing multiple arginine or lysine amino acids are highlighted in red.