Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

28.1 Introduction

There are several ways that genes may encode alternative products. The most widely recognized mechanism is alternative splicing. However, genes may also employ noncanonical translational events to produce such products. Some of these mechanisms operate at the level of translational initiation. In prokaryotes, genes may include alternative ribosome-binding sites directing the synthesis of products that differ at the N terminus. In eukaryotes, in which ribosome-binding sites do not exist, leaky scanning allows the same kind of variation. Noncanonical elongation events can also generate products that differ at their C terminus (13). Such events include programmed readthrough of translational termination codons (4,5) translational frameshifts (69), and translational hops (10,11). In each case, the ribosome fails to follow normal rules of decoding, leading to the synthesis of a protein that is not encoded, in the normal sense, in the DNA. In this chapter, we will describe the methods employed in the identification and analysis of programmed translational frameshift sites, including their discovery, measurement of the efficiency of the events, and determination of the mechanism of the frameshift.

28.1.1 Recognizing Programmed Frameshift Sites

Usually, the first indication of a possible frameshift event comes from the analysis of open reading frames (ORF) within a region of interest. Since translational frameshifting occurs by a ribosome shifting from one open reading frame to a second overlapping reading frame, overlapping frames alert the researcher to the possibility of frameshift events. We use a simple Macintosh program, DNA Strider (12), for all our analysis, though any of a large number of available programs may be used. The simplest way to visualize!Kbe existence of overlapping reading frames is using a graphic ORF map, as shown in Fig. 1 . The graphic map indicates the position of each nonsense codon (by a long vertical line) and each initiation codon (by a short vertical line). The map shows the distribution of stop and start codons in each of the six forward and reverse reading frames. An open reading frame is indicated by the occurrence of a large region devoid of nonsense codons that includes an initiation codon near its 5′ end. In Fig. 1 , the first gene (in this case, the TYA1 gene encoding the gag homolog of a yeast retrotransposon) begins with an AUG at position 5180 of the sequence shown and ends at a stop codon at 6502. A second reading frame (the TYB1 gene encoding the pol analog) extends from a stop codon at position 6459 to one at 10448. As shown in the lower panel of Fig. 1 , the two genes overlap in a short region, with the TYB1 shifted +1 with respect to TYA1. Though overlapping frames may indicate a frameshift event, not all frameshifts are of this type. In at least one case, the dnaX gene of Escherichia coli, frameshifting results in the expression of a truncated gene product since after shifting, the ribosome encounters a premature stop codon in the shifted frame. This event, though, is as yet unique, so that all other sites program ribosomes to shift from one reading frame into another as shown in Fig. 1 .

The graphic ORF map indicates the possibility of a programmed frameshift event. However, most sequences in which open reading frames overlap probably do not promote a significant level of translational frameshifting. Programmed frameshift events depend on particular nucleotide signals in the mRNA. These can be recognized in many cases by inspection (13). Sites capable of promoting high levels of frameshifting share two characteristics. They induce a translational pause by any of several mechanisms, and they allow slippage of ribosome-bound tRNAs between cognate or near-cognate codons. The most common type of frameshift event in the literature is a −1 simultaneous slippage site that was first identified in retroviruses and coronaviruses. As shown in Figs. 2A , B, this event occurs on runs of sequences of the form X-XXY-YYZ, shown grouped as codons of the zero frame. Frameshifting on these sites occurs by slippage of two tRNAs from XXY-YYZ to XXX-YYY. The precise sequence requirements of these sites have been defined by mutagenesis. In general the nucleotides represented as X can be A, G, C, or U, and those by Y can be either A or U; the identity of the Z base varies from one site to another, and among species. Such sites can be identified by inspection looking for sites which conform to these rules within a defined overlap between two ORFs. Slippage on this heptamer (the “slippery heptamer”) requires a translational pause induced by a secondary structure, usually a pseudoknot ( Fig. 2A ). This structure occurs immediately downstream of the heptamer, beginning about 6 nt away (13). There is not a stereotyped form to this structure since the size and structure of the pseudoknots varies widely, and some sites replace the pseudoknot with a stem-loop, a pair of “kissing” stem loops (two stem loops that interact by base pairing within the loop region), or no apparent structure at all. Sequences capable of forming most of these secondary structures are not easy to identify by computer analysis. No general DNA sequence analysis package can predict pseudoknots, for example, the widely used computer program FOLD of Zuker (14) and his colleagues identifies only stem-loops. A specialized computer program STAR has been developed by Pleij and his colleagues that can predict higher-order structures like pseudoknots (15); it is available from the authors in forms suitable for use with Macintosh and IBM personal computers, and with mainframes. Programmed frameshifts occur in bacteria that are analogous to the eukaryotic events, though they differ in significant ways. Many of these sites (as in the dnaX gene [16] include a third element that stimulates efficient frameshifting, a Shine-Dalgarno interaction between the 16s rRNA and a site about 10 nt upstream of the heptamer ( Figs. 2B and 2C). In addition, some of the bacterial −1 frameshift sites, notably the site in the IS1 element, require a 4 nt slip site instead of the heptamer (in ISI it is A-AAA) (17). The possibility of such noncanonical −1 slippage sites would complicate the effort to identify programmed sites in bacterial genes. Some programmed frameshift sites cause the ribosome to shift in the +1 direction, including the sites in the gene prfB encoding release factor-2 in E. coli (18), and the retrotransposon Ty1 in the yeast Saccharomyces cerevisiae (9) (see Figs. 2C and 2D). +1 frameshifting occurs by slippage of a single tRNA during a pause caused by the slow recognition of the next codon. The pause inducing codon may be either a slowly decoded sense codon (as in Ty1) or a poorly recognized nonsense codon (as in prfB). Slippage occurs again between cognate or near cognate codons. The slip site in prfB is CUU-U, with a tRNALeu slipping from CUU to UUU; in Ty1 the slippage occurs at CUU-A, between CUU and UUA. Other slippage sites are possible. For example, in bacteria, slippage-induced frameshifting occurs most efficiently on the sequences CCC-U, UUU-U, GUU-U, and CCU-U. All of these share the capacity for forming at least 2 bp with the slipped tRNA. Slippage is not a universal requirement for +1 frameshifting, since in yeast there is no correlation between tRNA slippage and frameshifting. However, the ability of each of the 64 codons to stimulate frameshifting, even if not by tRNA slippage, has been measured, identifying eight tRNAs which can stimulate frameshifting when bound to 11 codons (19). These +1 frameshifts are stimulated by the slow recognition of the next available codon. In many cases that codon is a poorly recognized termination codon. The ability of stop codons to induce pausing is indirectly related to the rate at which they are recognized by release factor. A tetranucleotide sequence, consisting of the stop codon and the 3′ neighbor, defines the rate of recognition. Termination efficiency varies widely among these 12 sequences, UAG-N, UAA-N, and UGA-N, but is species specific (20), allowing one to predict which termination codons would be most likely to promote a translational pause. The predictions of this analysis have turned out to be accurate for the yeast system; the predicted poorly recognized codons UAG-C, UAA-C, and UGA-C all provide a pause sufficient to promote frameshifting (21). Other frameshifts are induced by slowly decoded sense codons. The rate of decoding of each codon is also species-specific, though it has not been defined well enough to be able to predict which codons would induce frameshifting. It is known that the low abundance tRNAArg isoacceptors specific for AGG and AGA are in low abundance, and that merely juxtaposing two of either codon induces high levels of frameshifting (22). The AGG-decoding tRNA is also limiting in yeast, and the pause in the Ty1 system occurs at an AGG codon (9). Similarly, the slowly decoded Ser codon, AGU, induces frameshifting in the retrotransposon Ty3 (23).

Fig. 1
figure 1

Graphic open reading frame analysis. The open reading frame for the region containing a Ty1-912 insertion at the HIS4 locus is shown FUS1 an uncharacterized gene YCL028w, and BIK1 are upstream of Ty1 ORFs were searched in all six reading frames indicated as −3 to 3. The long thin lines within each frame indicate stop codons and the short lines indicate methionine codons. An ORF is indicated by the presence of a stretch of sequence in which the stop codon is not present.

figure 2

Fig. 2 The signals that cause frameshifting in four different systems. (A) MMTV gag-pro frameshift signal −1 simultaneous slippage frameshifting in mouse mammary tumour virus (MMTV). (B) dnaX frameshift signal −1 dual tRNA slippage in the E. coli dnaX gene. (C) prf frameshift signal +1 frameshifting in the release factors gene prfB in E. coli. (D) Ty1 frameshift signal. +1 frameshifting in retrotransposon Ty1 in yeast Saccharomyces cerevisiae All the frameshift events shown require a pause that is provided by a pseudoknot (A), stem-loop structure (B), stop codon (C), or a slowly decoded codon (D) as well as require a slippery sequence. Frameshifting in the dnaX (B) and the prfB (C) genes also require interaction between the Shine-Dalgarno sequence and the complementary sequence in the 16s rRNA.

28.1.2 In Vivo and In Vitro Assay Systems

Estimating the efficiency of frameshifting at any site requires the construction of a reporter system in which the expression of a readily assayed product depends on frameshifting. Though these assays can be done either using an in vivo expression system, or using an in vitro approach, the vectors used are very similar. All are variations on the dicistronic construct, first used by Jacks and Varmus in analyzing the −1 frameshift in Rous sarcoma virus (RSV) (6). For example, Weiss et al. (18) used a dicistronic construct to study the mechanism of frameshifting at the prfB gene (see Fig. 3A ). An upstream gene encoding S. aureus protein A was fused through the prfB frameshift site to the lacZ gene of E. coli. Protein products generated by normal decoding or by frameshifting could both be purified based on the affinity of protein A for the Fc portion of IgG. Thus the efficiency of frameshifting can be estimated by comparing the amount of each product produced. Secondly, the efficiency of frameshifting could be estimated by measuring the enzymatic activity of the lacZ product, β-galactosidase; the efficiency was estimated as the ratio of expression of β-galactosidase from frameshift constructs vs in-frame controls. Thus allowed a numerical estimate of efficiency, as well as allowing comparison among various mutant forms of the frameshift site. In some frameshift reporters the upstream ORF is so short that its synthesis can not be followed independently. One estimates efficiency in this case only by comparing the expression of the downstream gene, usually lacZ, from frameshift and in-frame constructs. This approach eliminates the opportunity for direct comparison of products generated by frameshifting and by normal decoding. However, since the frameshift site is near the N-terminus of the protein, it allows for the possibility of sequencing the peptide expressed across the frameshift site by the Edman degradation method. The vector we have used to study frameshifting in yeast is shown in Fig. 3B . This plasmid, pMB38-9merWT (9), is a reporter construct of the second type, in which expression of the lacZ gene is dependent upon frameshifting at a site derived from the Ty1 element which has been inserted 33 codons downstream of initiation codon. Expression of the reporter gene depends on the promoter of the HIS4 gene, encoding enzymes of histidine biosynthesis; this promoter was selected because of its high activity. The Ty1 frameshift site was inserted between BamHI and KpnI sites located 33 codons into the gene. The lacZ gene was inserted downstream of the KpnI site such that it was in the +1 frame with respect to the upstream HIS4 reading frame. Expression of β-galactosidase therefore depends on ribosomes shifting reading frame at the Ty1 programmed frameshift site. To estimate the efficiency of this event, a second plasmid was constructed in which a single nucleotide was deleted from the frameshift site, fusing the HIS4 and lacZ reading frames. In this case, all ribosomes initiating translation of HIS4 will continue reading into lacZ (excision of the nucleotide to fuse the genes also inactivated the frameshift signal). It should be understood that the ratio of expression of the frameshift and frame fusion constructs is not necessarily the same as the microscopic efficiency of frameshifting at the programmed frameshift site. The amount of protein expressed from a construct depends on a variety of factors (promoter strength, initiation efficiency, elongation efficiency, and translational processivity). All of these factors are the same for the two constructs since they differ only by the lack of a single nucleotide in a 10.8-kb vector. However, we have found that when the efficiency of frameshifting is elevated sufficiently high we lose the ability to measure it (19). When frameshifting becomes highly efficient (greater than 50% of the ribosomes apparently changing frames) then the requirement for ribosomes to shift reading frames no longer is limiting for the expression of β-galactosidase, a foreign protein, in yeast. It is not clear what the origin of the effect is, but it appears that an event, or events, occurring after passage through the frameshift site become more limiting than the frameshift itself. It is not clear to what extent this effect biases our estimates of frameshift efficiency when frameshifting is less active. Following the identification of the minimal frameshift site, one has to identify the actual frameshift event. This entails sequencing the transframe protein. To do this the fusion protein has to be first isolated. There are several methods to simplify this isolation, for example using a 6-His tag, which can be purified on Ni-affinity columns. We used a different approach, expressing a β-galactosidase fusion protein, and purifying it by immunoaffinity chromatography using the following protocol.

Fig. 3.
figure 3

Frameshift reporter systems. (A) The plasmid used by Weiss et al. (18) to assay for frameshifting. Expression of the frameshift construct from the plasmid results in a protein A-frameshift sequence-β-galactosidase fusion protein. Protein A can be used as an epitope to purify both alternative products, and thus to quantitate the amount of frameshifting and readthrough. (B) The pMB38-9merWT vector used by this laboratory to quantitate frameshifting in Ty1 (9) and Ty3 (23) in yeast. The plasmid is driven by a HIS4 promoter. Frameshifting frequency is determined by measuring the LacZ expression in the zero frame (in the absence of frameshifting) and in the +1 frame due to frameshifting.

28.2 Materials

  1. 1.

    Buffer A 50 mM Tris-HCl, pH 7.4, 150 mM NaCl, 5 mM EDTA, 0.1% Tween-20, 10 mM β-mercaptoethanol, and 0.5 mM phenylmethylsulfonylfluoride (PMSF).

  2. 2.

    Buffer B: 50 mM, Tris-HCl, pH 7.3, 0.2% NP-20.

  3. 3.

    Frozen glass beads (Sigma, St Louis, MO G-9268, 425-600 microns) that have been previously soaked in nitric acid and subsequently washed at least 10 times.

  4. 4.

    Anti-β-galactosidase immunoaffinity column (Protosorb, Promega, Madison, WI).

  5. 5.

    High-pH elution buffer 0.1M NaHCO3 and Na2CO3, pH 10.8.

  6. 6.

    Tris-NaCl buffer 50 mM Tris-HCl, pH 7.3, 150 mM NaCl.

  7. 7.

    Centricon 30 (Amicon, Danvers, MA) centrifugation filter cartridges.

  8. 8.

    HPLC-grade water.

28.3 Methods

28.3.1 Purification of lacZ Fusions by Immunoaffinity Chromatography Prior to N-Terminal Protein Sequencing.

  1. 1.

    The plasmid containing the frameshift construct is transformed into yeast (24,25). The cells are grown to saturation under selective pressure consistent with the presence of URA3 on the vector, in 10X 1L vol in 2-L flasks using standard yeast methodology.

  2. 2.

    Cells are pelleted at 2000g in Sorvall RC-5B centrifuge and washed using binding buffer buffer A. The cells are pelleted again and weighed. The cells are resuspended in an equal volume of buffer A and mixed with an equal volume of frozen glass beads. The suspension is then transferred to a Bead-Beater (Biospec Products, Bartlesville, OK) (see Note 1 ).

  3. 3.

    In a cold room, the cells are disrupted by four to six cycles of 1 min disruption in the Bead-Beater followed by 1 min cooling on ice. The cells are viewed under the microscope to visually estimate the extent of cell breakage. Disruption is continued until greater than 75% of the cells are broken.

  4. 4.

    After cell disruption, the supernatant excluding the beads is drawn off using precooled pipets, transferred to centrifuge tubes, and centrifuged at 100,000g for 1 h to eliminate cell debris. The supernatant (S100 fraction) is then transferred to fresh 50-mL tubes and stored cold.

  5. 5.

    The amount of β-galactosidase protein in the preparation is determined using the standard in vitro assay (26). The amount of enzyme present can be estimated given the specific activity of pure β-galactosidase (300,000 U/mg protein, where a unit is the amount of enzyme necessary to cleave 1 nmol of substrate, orthonitrophenyl-β-D-galactopyranoside (Sigma, N1127), per minute at 28°C).

  6. 6.

    While the extract is being prepared, the anti-β-galactosidase immunoaffinity column is equilibrated by flushing the column with at least 3 vol buffer A (see Note 2 ) The S100 fraction is passed over the column and the eluate is collected in a 50-mL tube. During this process, the β-galactosidase fusion protein adheres to the column. Load no more enzyme than the capacity of the column, which is approx 1 mg/mL of bed volume. Assume that the fusion protein has the same specific activity given in step 5 . The number of mg of β-galactosidase can be calculated from activity assays. Residual S100 extract can be stored on ice. The flow through fraction should be monitored for β-galactosidase activity to ensure that the expected amount of enzyme has bound to the column.

  7. 7.

    The column is then washed with at least three column volumes of buffer B and eluted three times with 1 mL of high-pH elution buffer, followed by elution with 1 mL Tris-NaCl, buffer.

  8. 8.

    The combined eluates are concentrated using Centricon 30 (Amicon) centrifugation filter cartridges and washed extensively with HPLC-grade water (4–6 mL) to eliminate the high-salt buffer.

  9. 9.

    We have found that the protein eluted at this point is not pure enough for direct sequencing, and that it requires further purification to remove contaminating proteins. To further purify the protein, repeat the immunoaffinity chromatography in steps 5 8 (see Note 3 ). The concentrated eluate is then transferred to 1.5-mL microfuge tubes and can be stored frozen. Although this treatment destroys its enzymatic activity, freezing will minimize its degradation and improve the chances of obtaining a good sequence. An aliquot is taken and run on a SDS-PAGE gel to determine its purity. At this point, the protein should be sufficiently pure that it can be directly sequenced using the Edman degradation technique.

Fig 4.
figure 4

Analysis of protein sequence. The RNA sequence and the predicted protein sequence in the zero frame and in the +1 frame are shown in the top panel. The observed amino acid profile for alanine, serine and valine from sequencing the fusion Ty3-β-galactosidase protein are shown in the bottom panel. The presence of valine and not serine at position 10 is indicated by the arrows. The box in the RNA sequence indicates the frameshift site.

28.3.2 Analysis of the Protein Sequence

Deducing the event that occurs at a programmed frameshift site depends vitally on the protein sequence encoded across the site, and on the effect of site-specific mutations within it. Various programmed alternative translation events, such as +1 frameshifts, −1 frameshifts, translational hops, and readthrough of termination codons, can be inferred by the absence of particular amino acids encoded at the site. Fig. 4 illustrates the structure of the +1 frameshift site in the Ty3 retrotransposon of yeast, together with the observed protein sequence encoded across the site (23). The tenth amino acid expected from normal translation is serine, which does not appear in the protein expressed by frameshifting. In the frameshift product, the tenth amino acid is valine, which is present in the +1 frame overlapping the serine codon After the valine at position 10, the sequence of the peptide continues to match the predicted sequence in the +1 frame. This indicates that the change in reading frame occurs after decoding of the ninth amino acid, alanine, and occurs by reading of the +1 frame codon, GUU, as valine. In this case, the site of the frameshift is uniquely determined. However, in some cases, the position of the frameshift is ambiguous. For example, protein sequence data cannot differentiate between single tRNA and dual tRNA slippage at −1 simultaneous slippage frameshift sites. In the case of the dnaX site, the slippery heptanucleotide A-AAA-AAG is encoded as Lys-Lys in both the 0 and −1 frames (AAA-AAG and AAA-AAA). Frameshifting occurring either after decoding of the AAG, the predicted −1 frameshifting mechanism, or by slippage on A-AAA before decoding AAG would give the same peptide product. Distinguishing between these two mechanisms required site-specific mutagenesis of the site. Mutating the AAG to AUG, to interfere with slippage on that codon, reduced frameshifting drastically, consistent with the simultaneous slippage model (27). By contrast, the insertion sequence IS1 of E. coli includes an apparent −1 frameshift site including a slippery heptamer A-AAA-AAC, but frameshifting requires only the A-AAA motif, and apparently does not occur by the simultaneous slippage mechanism (17). This result underscores the need for detailed analysis of any putative frameshift site.

28.4 Notes

  1. 1.

    Similar procedures could be used for purifying lacZ fusion proteins following expression in bacterial or other eukaryotic cells; the only changes necessary would be with regard to breakage of the cells.

  2. 2.

    Care should be taken to ensure that the liquid level in the column does not drop below the level of the beads in the column.

  3. 3.

    The anti-β-galactosidase column can be regenerated by washing it with at least three column volumes of buffer A.