Introduction

RNA polymerase II (Pol II) transcribes DNA and produces pre-mRNAs that are processed into mature mRNAs. The processing steps include capping, splicing, and cleavage and polyadenylation (CPA) at the 3′ end of the transcripts1,2. Alternative cleavage and polyadenylation (APA) in the last exon generates mRNA isoforms that encode the same protein, but with different 3′ untranslated regions (UTRs)3,4,5 (Fig. 1a). The use of intronic polyadenylation sites (PASs) generates truncated proteins, but also changes the 3′ UTR of the resulting mRNA6,7.

Fig. 1: Cleavage and polyadenylation of mRNA isoforms at their 3′ ends.
figure 1

a | Alternative cleavage and polyadenylation (APA) generates mRNA isoforms that differ in their 3′ untranslated regions (UTRs). mRNA processing at intronic polyadenylation (IPA) sites generates mRNA isoforms (IPA isoforms) that encode proteins with alternative C-termini (light blue part of protein). mRNA processing at proximal or distal polyadenylation sites (PASs) in terminal exons generates mRNA isoforms with short 3′ UTRs (SU) or long 3′ UTRs (LU) that encode proteins with the same amino acid sequence. The grey boxes in the protein symbols represent protein domains. Introns are not drawn to scale. b | During transcription by RNA polymerase II (Pol II), the cleavage and polyadenylation (CPA) machinery binds the PAS hexamer motif AAUAAA and surrounding sequence elements in the nascent RNAs. Shown are CPA factors bound to the pre-mRNA (or nascent RNA) and to the C-terminal domain of Pol II while it is transcribing DNA. The different CPA protein complexes are colour-coded. Protein names or symbols (and gene symbols (in parentheses) when different) are given in the boxes. c | Sequence context of functional PASs. Endonucleolytic cleavage of the nascent RNA occurs ~20 nucleotides downstream of a PAS hexamer that is located in a suitable sequence context containing the UGUA and (G+U)-rich or U-rich sequence upstream and downstream, respectively. The colours of the RNA elements correspond to the colours of the complexes in part b that interact with them. CFI, cleavage factor I; CFII, cleavage factor II; CLP1, cleavage factor polyribonucleotide kinase subunit 1; CPSF1, CPA specificity factor subunit 1; CSTF, cleavage stimulation factor; FIP1, factor interacting with PAPOLA and CPSF1; nt, nucleotides; PAF1, RNA polymerase II-associated factor 1; PABPN1, poly(A)-binding protein 2; PAP, poly(A) polymerase; RBBP6, RB-binding protein 6; SCAF4, serine and arginine-related C-terminal domain-associated factor 4; TSS, transcription start site; WDR33, WD repeat domain 33.

The different 3′ UTRs contain a large number of regulatory elements that control mRNA and protein abundance, mRNA and protein localization, and protein complex assembly and therefore protein function8,9,10. In addition to determining 3′ UTR length, CPA is essential for the generation of mature mRNAs as incompletely processed transcripts are subject to degradation and cannot serve as the templates for protein synthesis. Moreover, PAS cleavage is an important determinant of mRNA expression levels11,12,13,14.

mRNA 3′-end formation occurs co-transcriptionally, and is accomplished by the CPA machinery, which comprises a multiprotein core complex and dozens of associated factors15,16,17,18,19 (Fig. 1b). CPA specificity factor (CPSF) recognizes the PAS hexamer motif AAUAAA (or variants thereof) in the nascent RNA. Additional upstream and downstream regulatory RNA sequence elements recruit further CPA factors that promote 3′-end processing. These elements include UGUA, the binding site of cleavage factor I (CFI), and the downstream U-rich or (G+U)-rich region, which is bound by cleavage stimulation factor (CSTF)2,14,20 (Fig. 1b,c). The additional elements are necessary to define functional PASs, as on average a polyadenylation signal hexamer occurs every 500 nucleotides in pre-mRNAs yet it is not used for CPA at these sites3.

Alternative 3′ UTR isoforms are expressed in a highly gene-specific and cell-type-specific manner3,5,21. In this Review, we discuss how expression of 3′ UTR isoforms is regulated. As the CPA machinery carries out APA, the concentration of CPA factors is an important determinant of APA. Depletion of these factors often shifts APA globally and changes 3′ UTR isoform expression of hundreds of genes22,23,24,25,26,27,28,29,30. However, environmental changes during development, stress and metabolic adaptation induce APA changes of only subsets of genes31,32,33,34,35.

We discuss how gene-specific 3′ UTR isoform changes are accomplished, including cell-type-specific abundance of regulatory factors and enzymes that post-translationally modify factors that control transcription elongation, transcription termination and CPA. Gene-specific sequence elements in promoters and enhancers also affect 3′ UTR isoform choice14,36,37,38. These layers of regulation allow cells to integrate environmental signals and respond with appropriate changes in gene and 3′ UTR isoform expression.

One of the surprising findings in the past 10 years was the observation that gene expression and APA are largely independent, as genes that change their 3′ UTR isoforms often do not considerably change their overall expression level3,5,27,39,40,41. Moreover, genetic variants associated with gene expression changes rarely overlap with genetic variants that affect APA42,43,44. These data indicate that single-UTR genes, which contain one functional PAS at their 3′ ends, and multi-UTR genes, which have at least two functional PASs, differ substantially in their mode of regulation. We discuss the implications of such independent regulation and highlight the functions of alternative 3′ UTRs beyond the regulation of mRNA abundance35,45,46,47,48,49.

We feature new transformative tools that were developed through recent advances in single-cell RNA sequencing (scRNA-seq) and CRISPR–Cas technologies. These methods allow the reliable quantification of 3′ UTR isoforms in any cell type from publicly available datasets5,50,51,52, and facilitate manipulation of individual 3′ UTRs at endogenous gene loci to study their regulatory logic and function53,54,55. Together with small molecules that modulate PAS cleavage activity to control mRNA and protein expression, the insights gained by these novel tools will enable the development of RNA therapeutics that take advantage of isoform-specific and compartment-specific targeting.

Two classes of APA

APA that occurs in the terminal exon generates tandem 3′ UTR isoforms (short and long isoforms) and is often called ‘3′ UTR APA’. This class of APA does not change the protein coding sequence. By contrast, APA can also occur in introns — termed ‘intronic APA’ — thereby generating truncated proteins.

3′ UTR APA

Sequences downstream of the coding region of genes contain many cis-regulatory elements. 3′ UTR length, and thus the regulatory potential, is determined by PAS choice. Multi-UTR genes generate mRNA isoforms with short 3′ UTRs or long 3′ UTRs3 (Fig. 1a). Although mRNA isoforms that differ only in their 3′ UTRs encode the same protein, the different 3′ UTRs can determine the half-life of mRNAs or proteins and their subcellular localization8,9. Differences in 3′ UTRs of gene homologues, which often encode proteins with highly similar amino acid sequences, allow differential regulation of each protein49,56. Both single-UTR and multi-UTR genes encode proteins with similar size distribution and a median coding region length of ~1,250 nucleotides. Sequencing methods that identify mRNA 3′-end cleavage sites transcriptome-wide and measure alternative 3′ UTR isoform expression are called 3′-end sequencing methods and are abbreviated to 3′-seq here. 3′-seq revealed that 3′ UTR length differs substantially between single-UTR genes and multi-UTR genes and resulted in an estimated median 3′ UTR length of ~600 and ~1,900 nucleotides (for the longest isoform), respectively6. The use of the coding-sequence proximal PAS in multi-UTR genes results in expression of short 3′ UTR isoforms with a median 3′ UTR length of ~300 nucleotides41. As APA often removes most 3′ UTR regulatory elements, 3′ UTR-mediated regulation may be particularly important for multi-UTR genes.

Intronic APA

Transcripts that are generated from intronic PASs are called ‘intronic polyadenylation (IPA) isoforms’ or ‘alternative last exon isoforms’. Many IPA isoforms have a physiological role and contribute to proteome diversification by generating proteins with alternative carboxy termini (Fig. 1a). IPA isoforms are often expressed in a cell-type-specific manner6,7,35,57,58,59,60. However, IPA can occur also in the context of premature CPA, in which no functional protein is produced33,34,59,61. This type of IPA arises following mutation of CPA factors in cancer, or it can be induced experimentally through manipulation of transcript elongation or splicing factors33,34,59,61. In B cell leukaemia, which is a disease with very few DNA mutations, inactivation of tumour-suppressor genes occurs mostly through the generation of truncated proteins by IPA instead of by DNA mutations59.

Premature CPA is also used by cells to downregulate the expression of full-length mRNAs. For example, IPA is upregulated following heat shock, which results in the downregulation of many genes not directly involved in the stress response62. IPA also determines the expression of the CPA factors PCF11 and CSTF3 through feedback autoregulation29,63,64. IPA is tightly controlled by co-transcriptional splicing and Pol II elongation rates. For example, binding of the splicing factor U1 small nuclear RNA strongly inhibits intronic PAS use, thereby allowing the generation of full-length mRNAs61. Conversely, a decrease in splicing factor availability or interference with splice site recognition increases IPA. As splicing factors and CPA-promoting factors physically interact with Pol II, they can compete for access to the elongating polymerase and tip the balance towards expression of IPA or full-length mRNAs33,34,65,66.

Regulation of alternative 3′ UTRs

APA is regulated both co-transcriptionally and post-transcriptionally (Fig. 2). As CPA occurs while Pol II is transcribing a gene, Pol II elongation dynamics have a major role in regulating APA. Pol II termination factors extensively interact with RNA-binding proteins such as splicing factors and CPA factors that recognize RNA sequence elements to control where and when a pre-mRNA is cleaved and polyadenylated2,14,20 (Fig. 2). RNA-binding proteins that are involved in the regulation of APA were reviewed previously1,2, and their effects are summarized in Table 1. All the factors involved can be regulated at the levels of abundance and activity, often in a cell-type-specific and condition-specific manner, thereby allowing APA at individual genes.

Fig. 2: Overview of regulation of alternative polyadenylation by co-transcriptional and post-transcriptional mechanisms.
figure 2

Alternative cleavage and polyadenylation (APA) is regulated both co-transcriptionally and post-transcriptionally. Cis-regulatory elements in the RNA and the DNA are necessary but not sufficient for cleavage and polyadenylation (CPA). As CPA occurs while RNA polymerase II (Pol II) transcribes a gene, Pol II elongation dynamics regulate APA. A selection of APA-regulating factors is given. Pol II termination factors show extensive crosstalk with RNA-binding proteins such as splicing factors and CPA factors that recognize RNA sequence elements and control where and when a pre-mRNA is cleaved and polyadenylated. Following processing of 3′ untranslated region isoforms, the expression level of individual APA isoforms is further regulated by post-transcriptional processes. Both the abundance and the activity of factors that control APA can be regulated co-transcriptionally and post-transcriptionally, often in a cell-type-specific and condition-specific manner, thereby allowing dynamic regulation of APA at individual genes. NEXT, nuclear exosome targeting (a trimeric protein complex); NXF1, nuclear RNA export factor 1; PAF1, RNA polymerase II-associated factor 1; PAXT, poly(A) tail exosome targeting (a trimeric protein complex); SCAF4, serine and arginine-related C-terminal domain-associated factor 4; SPT5, suppressor of Ty 5; SRSF, serine and arginine-rich splicing factor 3; THOC5, THO complex 5.

Table 1 RNA-binding proteins that co-transcriptionally regulate alternative polyadenylation

Pol II elongation dynamics

DNA sequence and chromatin structure, together with transcript elongation factors, determine Pol II dynamics and the functional state of the polymerase. Pol II elongation and termination factors cooperate or compete with CPA factors to control pre-mRNA CPA.

DNA sequence and topology

The speed of transcript elongation is decreased by Pol II pausing or backtracking67,68. Increased Pol II pausing downstream of a functional PAS promotes CPA at this site and contributes to APA regulation67,69,70 (Fig. 3a). In addition to promoter-proximal Pol II pausing, which is not discussed here, Pol II pausing in gene bodies is caused by certain DNA sequences, termed ‘pause sites’, or by structures in RNA and DNA, such as G-quadruplexes67,70.

Fig. 3: Co-transcriptional regulators of alternative polyadenylation.
figure 3

a | A roadblock in the DNA can affect RNA polymerase II (Pol II) translocation and use of alternative polyadenylation sites (PASs). The roadblock can be a DNA sequence element, such as a pause site, a structural element, such as a G-quadruplex, or a large protein that impedes Pol II elongation, thereby increasing the use of a proximal PAS. b | Cyclin-dependent kinase 12 (CDK12) is required for transcript elongation and suppression of premature cleavage and polyadenylation (CPA) of long genes. CDK12 promotes transcript elongation by phosphorylating Pol II. Loss of CDK12 predominantly affects DNA damage repair genes because of their extensive lengths and their lower U1 small nuclear RNA to PAS ratio. The lower processivity of Pol II results in premature CPA, thereby reducing expression of full-length mRNAs and protein output. c | Discovery of diverse alternative cleavage and polyadenylation (APA) regulators in a single genetic screen in Caenorhabditis elegans. unc-44 encodes the cytoskeleton protein ankyrin. Use of an intronic PAS in unc-44 generates intronic polyadenylation (IPA) mRNA isoforms, which encode ankyrin that is ubiquitously expressed, including in immature neurons. Giant ankyrin is expressed only in mature neurons, where it is generated from the full-length (FL) mRNA isoform through suppression of the intronic PAS. Worms that lack casein kinase 1δ (CK1δ; encoded by kin-20) have impaired movement owing to a block in neuron maturation that is caused by the lack of giant ankyrin expression. A screen for suppressor mutations identified 13 genes that, when knocked out, were able to restore the expression of giant ankyrin and the generation of mature neurons in kin-20-KO worms. The screen identified mutations that disrupt the intronic PAS sequence of the ankyrin gene, and mutations in several factors that control transcript elongation, transcription termination and CPA and in enzymes that regulate their activity. 3′-seq, 3′-end sequencing; CTD, C-terminal domain; KO, knockout; LU, long 3′ untranslated region isoform; SU, short 3′ untranslated region isoform.

Increased Pol II pausing can also be accomplished by a change in the chromatin environment71,72. For example, DNA methylation in gene bodies is important to prevent binding of the insulator protein CTCF. CTCF binding recruits the cohesin complex and induces APA changes across a few hundred genes71. Mechanistically, CTCF and cohesin may act as roadblock for Pol II. In addition, CTCF and cohesin binding gave rise to new chromatin loops and altered the phosphorylation status of the Pol II C-terminal domain (CTD) (see Pol II-associated elongation and termination factors). As the CTD is a crucial regulator of Pol II-mediated regulation of mRNA processing at different stages of transcription, the changes in Pol II CTD modifications may contribute to the increased use of proximal PASs through reduced DNA methylation71.

Pol II-associated elongation and termination factors

The speed of Pol II largely depends on CTD-associated elongation factors and on the phosphorylation status of the CTD65,66,71,73,74,75, which functionally orients Pol II with respect to its location within a gene74. The CTD consists of 52 heptad repeats of the consensus sequence YSPTSPS. Several of these residues are phosphorylated by cyclin-dependent kinase 9 (CDK9), CDK12 and CDK13 and dephosphorylated by the phosphatases SSU72, FCP1 (also known as CTD phosphatase 1) and PP2A33,34,74,76,77,78,79. The association between CTD phosphorylation and APA was discovered in the context of cancer, where CDK12 loss-of-function mutations were associated with increased expression of IPA isoforms of DNA damage repair genes33,34. The increased IPA expression caused lower expression of full-length proteins, which inactivated the DNA damage response33,34,80. Loss of CDK12 impairs transcript elongation rates at many genes, but DNA repair genes are predominantly affected because of their substantial gene lengths and lower U1 small nuclear RNA to PAS ratios, which makes them more susceptible to intronic PAS use34 (Fig. 3b).

Transcription termination involves the dissociation of Pol II from the DNA template several kilobases downstream of the PAS66,81. CPA and transcription termination are separate events, but they are functionally linked. Pol II transition through a functional PAS is required for transcription termination, and slowdown of Pol II is required for proper CPA75,81. The phosphorylation status of the transcription elongation factor SPT5 is crucial for elongation and termination, and is regulated by CDK9 and the phosphatase PNUTS–PP1 (ref.75). PAS use was altered by depletion of CDK9 or SPT5 in a reporter assay14. Impaired activity of enzymes that regulate elongation and termination causes premature transcription termination, which leads to degradation of the nascent transcript before the distal PAS is transcribed, resulting in an APA pattern that may appear as increased IPA isoform or short 3′ UTR isoform expression and reduced full-length mRNA expression14,33,34.

Factors that bridge Pol II and the CPA complex

The strong effect of CDK12 on APA is also caused by its ability to recruit the PAF1 complex to Pol II (refs.65,82,83). The PAF1 complex bridges Pol II and the CPA machinery, and thus integrates the regulation of transcription elongation and termination and CPA. It connects elongating Pol II with the CPA machinery through its interaction with the CPSF and CSTF complexes82. Loss of the PAF1 complex and of PAF1-recruited CPSF3 from elongating Pol II leads to a transcription termination defect and is important for the regulation of APA and IPA14,35,83,84.

Additional connections between Pol II and the CPA complex are formed by RNA-binding proteins with Pol II CTD-interacting domains, including SR-related and CTD-associated factor 4 (SCAF4), SCAF8 and PCF11 (refs.29,66). PCF11 is a subunit of cleavage factor II (CFII) and regulates expression or APA of ~50% of human genes. PCF11 enhances transcription termination and stimulates CPA at proximal PASs, thereby resulting in 3′ UTR shortening, but regulation of expression or regulation of APA occurs at different genes29. SCAF4 and SCAF8 interact with Pol II, the PAF1 complex and CPA factors66. In human cells, SCAF4 and SCAF8 suppress transcription termination at intronic PASs to allow expression of full-length transcripts. Absence of both SCAF4 and SCAF8 upregulates more than 1,300 IPA isoforms in a sequence-specific manner, as most of these genes contain SCAF4- and SCAF8-binding sites upstream of their PASs66.

Identification of all classes of APA regulators in a single genetic screen

The importance of Pol II dynamics for APA regulation was revealed by a single genetic suppressor screen performed in Caenorhabditis elegans35 (Fig. 3c). This screen identified Pol II-associated factors, enzymes that regulate their activity and factors that bridge Pol II and the CPA machinery. APA at unc-44, which encodes the cytoskeleton protein ankyrin, was shown to be essential for neuron maturation in C. elegans. Immature neurons mostly express unc-44 mRNA isoforms generated by use of an intronic PAS, which encode ankyrin (Fig. 3c). By contrast, mature neurons produce longer mRNA isoforms that include additional exons, which encode giant ankyrin. Casein kinase 1δ (CK1δ; encoded by kin-20) was identified as a regulator of the switch from IPA to full-length ankyrin expression35. CK1δ-deficient worms do not express giant ankyrin, which results in impaired maturation of motor neurons and largely immobile worms. In a subsequent screen for suppressor mutations, 13 genes were identified that, when knocked out, were able to restore the generation of mature neurons in kin-20-knockout worms35.

All the identified mutations enable expression of giant ankyrin and were either involved in CPA or in transcription elongation and termination35,85 (Fig. 3c). They include mutations that disrupt the intronic PAS of the ankyrin gene, and mutations in SSU72, a direct target of CK1δ that acts as a Pol II CTD phosphatase with a crucial role in the transition of Pol II from elongation to termination77,86. Moreover, the screen identified PIN1, which is a prolyl cistrans isomerase that enhances the phosphatase activity of SSU72, and genes that encode components of Pol II, CPA factors and two subunits of the PAF1 complex65,82,83,87. That study represents a paradigm for context-dependent regulation of individual mRNA isoforms at specific time points during development by enzymes that control ubiquitous processes, such as transcription elongation and termination. Although CK1δ appears to regulate intronic PAS readthrough only at a single gene, the mechanism by which this enzyme acts in a gene-specific manner is currently unclear.

Gene-specific APA regulation

All elongation, termination and CPA factors involved in APA can be regulated at the levels of abundance and activity. In this section we explain how cell states or cell types induce changes in mRNA processing of individual genes. In some examples, APA changes can reinforce the cell state.

Early studies detected CPA factors at promoters88,89,90,91,92. In yeast, promoters are known to regulate many post-transcriptional processes, including mRNA stability, cytoplasmic localization and translation93,94,95,96,97. Two major conceptual models by which promoters regulate processes at mRNA 3′ ends have been proposed and are currently difficult to disentangle. In the promoter loading model, factors that are recruited to promoters travel along the gene with Pol II, thus making them available at downstream sites98. This model is supported by a cell-type-specific binding pattern of RNA-binding proteins at human promoters and enhancers99. As RNA-binding proteins are hubs for protein–protein interactions100,101,102,103, RNA-binding proteins at active promoters have the potential to be major regulators of APA. In the gene looping model, the transcription start site and the PAS physically interact through looping of the chromatin between them, thus allowing promoters to directly regulate 3′-end formation104,105,106. Gene loops between the promoter and either of the alternative PASs in CYC1 in yeast were detected; interestingly, changes in gene loop structure were observed upon alteration of growth conditions, which correlated with alternative PAS use106. Both models are supported by the findings we discuss later herein.

Alternative transcription start sites

Regulation of 3′ UTR isoforms of individual genes by transcription factors can be explained by the existence of alternative transcription start sites composed of different cis-regulatory elements, which allow the binding of factors that determine 3′-end formation. Evidence for this hypothesis was found in fly neurons, where the presence of the RNA-binding protein Elav at gene promoters was associated with the production of mRNA isoforms with long 3′ UTRs36. Accordingly, swapping of promoter sequences between genes changed 3′ UTR length (Fig. 4a). Combined evidence from RNA-seq and long-read sequencing further revealed significant co-occurrence between specific transcription start sites and specific alternative 3′ UTR isoforms in human cells, organoids and rat hippocampal neurons37,38. In C. elegans, promoters were found to determine different modes of transcription termination107.

Fig. 4: Gene-specific regulation of alternative 3′ UTR isoforms.
figure 4

a | Differential binding of factors at alternative transcription start sites can mediate gene-specific regulation of alternative polyadenylation site (PAS) use. In fly neurons, the RNA-binding protein embryonic lethal abnormal visual system (Elav) binds to specific promoters and transcription start sites (TSS) and thus controls the expression of mRNAs with longer 3′ untranslated regions (UTRs). b | Transcription factors that bind to specific enhancers regulate APA in a gene-specific manner. Signalling-induced activation of nuclear factor-κB (NF-κB) induces its binding to a specific enhancer of the phosphatase and tensin homologue gene (PTEN) in human cell lines. NF-κB activation does not change the production or stability of the PTEN mRNA, but induces a change in 3′ UTR isoform expression. Deletion of the enhancer or silencing of NF-κB impairs this signalling-induced APA change. c | Signalling-induced phosphorylation of cleavage and polyadenylation specificity factor subunit 6 (Cpsf6) increases its activity and promotes autophagy. In flies, inactivation of the mTOR pathway allows expression of two kinases (not shown) that phosphorylate Cpsf6 in the cytoplasm, thereby promoting its translocation to the nucleus and RNA-binding activity. Cpsf6 changes the APA pattern of two master regulators of autophagy, the autophagy-related protein 1 gene (Atg1) and Atg8a. Increased expression of their long 3′ UTR (LU) isoforms supports high-level protein expression, thereby inducing autophagy upon mTOR inhibition to allow intracellular nutrient uptake. 3′-seq, 3′-end sequencing; SU, short 3′ UTR.

Alternative enhancers

Enhancer-mediated recruitment of factors to promoters was recently suggested to regulate APA in human cells14. Although it is thought that transcriptional enhancers regulate transcript production, CRISPR–Cas-mediated deletion of a promoter-proximal enhancer did not change the production or stability of the PTEN mRNA, but instead it changed APA14. APA was regulated by the transcription factor NF-κB, which binds to the PTEN enhancer. Signalling-induced activation of NF-κB resulted in 3′ UTR shortening, whereas silencing of NF-κB impaired the signalling-induced APA change14 (Fig. 4b). Moreover, cell-type-specific enhancers were found to be significantly associated with 3′ UTR shortening during differentiation14.

The transcription factor BMAL1 controls expression of individual 3′ UTR isoforms

Circadian gene expression is regulated by transcriptional and translational negative feedback loops, initiated by the transcription factor CLOCK–BMAL1, which binds to promoters of genes expressed in a circadian manner108. As reported in a recent preprint, 3′-seq analyses performed in a period of 24 h identified hundreds of APA changes that correlated with the circadian rhythm109. When comparing circadian changes in gene expression and in APA, the study authors identified nearly 500 genes with a circadian expression pattern, whose alternative 3′ UTRs did not oscillate. Conversely, in a similar number of genes that were not rhythmically expressed, 3′ UTR isoforms with circadian oscillation were observed109. This recent study provides one of the most surprising findings in APA research, as it detected independent expression regulation of individual 3′ UTR isoforms. By contrast, in the past it was thought that alternative 3′ UTRs generated from a gene are linked, and it was predicted that upregulation of one isoform will downregulate the other isoform when gene expression remains constant3,27,39,40.

To identify how expression of individual 3′ UTR isoforms is regulated, the aforementioned study analysed APA changes in Bmal1-knockout mice (Bmal1 is also known as Arntl), and found that the majority of dysregulated APA events involved the circadian expression of individual 3′ UTR isoforms, while expression of the non-regulated isoforms remained fairly constant109. This result suggests a widespread lack of compensatory regulation, and instead indicates that BMAL1-bound promoters largely confer independent expression control of alternative 3′ UTR isoforms.

Regulation of APA factors by post-translational modifications

Gene-specific regulation of APA in response to environmental signals is further accomplished by kinases and ubiquitin ligases that post-translationally modify CPA factors and other RNA-binding proteins. Starvation inhibits mTOR and activates autophagy, a mechanism that provides nutrients from intracellular sources31. In flies, autophagy is stimulated chiefly by upregulation of autophagy-related protein 1 (Atg1) and Atg8a, through expression of their long 3′ UTR isoforms, which produce more protein than their short isoforms31 (Fig. 4c). This 3′ UTR lengthening was controlled by two kinases, which are expressed in cells with inactive mTOR and phosphorylate Cpsf6, a subunit of CFI. Phosphorylation of CPSF6 in human cells and of Cpsf6 in flies increases its activity by promoting its nuclear localization and RNA-binding capacity31,110 (Fig. 4c).

Ubiquitin ligases are recruited to specific substrates by adapter proteins. As adapter proteins are often expressed in a pathway-specific or cell-type-specific manner, this regulatory mode allows context-dependent regulation of widely expressed ubiquitin ligases111,112. For example, expression of the ubiquitin adapter melanoma-associated antigen 11 (MAGEA11) is restricted to germ cells, but it is activated in cancer, where it recruits the ubiquitin ligase HUWE1 to the substrate PCF11. PCF11 ubiquitylation changes the composition of the CPA machinery by expelling NUDT21 (also known as CPSF5), which results in predominant 3′ UTR shortening111. MAGEA11 upregulation in cancer may contribute to the observed 3′ UTR shortening in solid cancers, and could provide a new avenue for therapeutic intervention112,113,114.

These examples illustrate how APA is dynamically regulated and responds to diverse external signals. These signals include cold exposure, nutrient availability, immune pathway activation and developmental cues5,14,31,35,48,109,110,115. Environmental changes often affect only a subset of multi-UTR genes and can be accomplished by pathway-specific enzymes. The gene-specific APA response to environmental and developmental stimuli makes APA an integral component of all gene regulation circuits.

Post-transcriptional regulation

The production of 3′ UTR isoforms is controlled co-transcriptionally, but their expression can be further altered by several mechanisms of post-transcriptional regulation. It is estimated that 10–30% of multi-UTR genes are subject to this type of regulation3,27,39,40,41,109,116. The mechanisms include sequential CPA, nuclear RNA degradation, regulation of mRNA export and control of mRNA stability113,115,117,118,119,120,121 (Fig. 2). Differential nuclear export can shape 3′ UTR isoform ratios, as long 3′ UTRs rely much more on the presence of export receptors than do shorter 3′ UTRs117,118,119.

Most techniques that assess 3′ UTR isoforms cannot distinguish between transcriptional regulation and post-transcriptional regulation. However, 3′-seq performed on nuclear and cytoplasmic fractions revealed differential expression of 3′ UTRs for approximately 450 genes. In most of these genes, the longer 3′ UTRs were more abundant in the nucleus than in the cytoplasm116. Moreover, some genes undergo sequential CPA in the nucleus121. The distal PAS, which is often stronger, is processed first, but owing to incomplete mRNA processing at upstream elements such as introns, the mRNA is retained in the nucleus121. Subsequent processing at a proximal PAS can then release the mRNA for export to the cytoplasm. That study nicely revealed that the processing efficiency of specific 3′ UTRs does not always correlate with their cytoplasmic expression121.

Following successful nuclear export, mRNAs with longer 3′ UTRs were observed at intracellular membranes122. As reported in a recent preprint, compartment-specific 3′-seq further identified post-transcriptional regulation as the predominant regulatory mode of ~400 circadian 3′ UTR isoforms whose corresponding genes were not expressed in a circadian manner109.

Global 3′ UTR changes and phenotypes

Single-UTR genes and multi-UTR genes differ substantially in their expression patterns. Conceptually, chromatin regulators and CPA factors can fulfil similar roles in shaping the cellular transcriptome. Whereas chromatin regulators shape the gene expression landscape, CPA factors that induce global shifts in APA may set up 3′ UTR isoform landscapes that then accommodate further 3′ UTR isoform changes.

Genetic variants in 3′ UTRs have many phenotypes

To evaluate the functional relevance of 3′ UTRs, several genome-wide association studies assessed the contribution of 3′ UTR genetic variants to phenotypic traits42,43,44. Non-coding regions harbour most genetic variants123. In addition to genetic variants that correlate with gene expression level, genetic variants were identified that are associated with changes in APA, termed ‘APA quantitative trait loci’ (apaQTL) or ‘3′ aQTL’. Correlation of apaQTL with phenotypes revealed that 15–19% of apaQTL are associated with heritability of known human traits or diseases42,43,44. This association indicates that a large fraction of phenotypic variation observed across individuals can be tied to changes in APA, which most often alters the PASs or binding sites of RNA-binding proteins located in 3′ UTRs44. Interestingly, these studies found that the genetic variants associated with APA rarely overlap with genetic variants associated with gene expression changes, suggesting that the two regulatory mechanisms are independent43,44. These results suggest that most apaQTL contribute to human traits and diseases in a way that does not involve changes in mRNA abundance.

Control of single-UTR genes versus multi-UTR genes

Comparison of changes in gene expression and in 3′ UTR isoform use across cell types revealed that only in a minority (10–15%) of multi-UTR genes both parameters change simultaneously, suggesting that the two gene regulatory mechanisms are independent3,27,39,40,41. This finding was recently confirmed by a large-scale study, available as a preprint, that compared changes in the expression of gene and 3′ UTR isoforms across 120 mouse cell types5. The analysis showed that during differentiation, a gene either changes its expression or changes its APA, except for a small minority of genes that altered both. Remarkably, the number of genes with 3′ UTR isoform changes between cell types was fairly similar to the number of genes with expression changes.

Moreover, the mode of regulation of cell-type-specific expression differs substantially between single-UTR genes and multi-UTR genes (Fig. 5a,b). Gene expression of most (~80%) single-UTR genes is regulated developmentally3,5. Chromatin accessibility and transcription factor expression determine whether these genes are transcribed or not transcribed in specific cell types (Fig. 5a). The remaining ~20% of single-UTR genes are classified as ‘housekeeping genes’, whose transcription is switched on in nearly all cell types3,5.

Fig. 5: Cell-type-specific regulation of single-UTR and multi-UTR genes is accomplished by different regulatory modes.
figure 5

a | Chromatin accessibility and transcription factor (TF) binding determine cell-type-specific transcription of genes whose mRNAs contain a single 3′ untranslated region (UTR) isoform (single-UTR genes). b | Multi-UTR genes are widely transcribed (like single-UTR housekeeping genes) yet change their 3′ UTR isoforms in a cell-type-specific manner. The cell-type-specific 3′ UTR landscape is determined largely by cell-type-specific expression of regulators of alternative cleavage and polyadenylation, such as cleavage and polyadenylation specificity factor subunit 5 (CPSF5; encoded by NUDT21). c | Cooperativity between a cell-type-specific 3′ UTR landscape and activation of signalling pathways or transcription factors determines phenotypic outcomes. 3′-seq, 3′-end sequencing; CPA, cleavage and polyadenylation; iPSC, induced pluripotent stem cell; LU, long 3′ UTR; OKSM, OCT4, KLF4, SOX2 and MYC; PAS, polyadenylation site; SU, short 3′ UTR.

At the transcriptional level, multi-UTR genes appear similar to housekeeping genes, as they are expressed in most cell types3,5 (Fig. 5b). Importantly, however, these genes express their alternative 3′ UTR isoforms in a cell-type-specific manner (Fig. 5b). This APA pattern is largely determined by cell-type-specific abundance of APA regulators, including CPSF5 or CPSF6, two subunits of CFI (refs.24,26,27,30,124) (Fig. 1). Many proteins with regulatory functions are encoded by multi-UTR genes (including transcription and chromatin regulators, ubiquitin enzymes, kinases and factors involved in mRNA or protein transport), suggesting that many of them are regulated by APA3.

APA regulators influence phenotypes by changing the 3′ UTR landscape

APA is globally dysregulated in cancer, predominantly exhibiting 3′ UTR shortening in solid tumours113,114. One of the APA regulators that is most relevant to 3′ UTR shortening is NUDT21 (CPSF5), whose downregulation by small hairpin RNAs results in 3′ UTR shortening of nearly 1,500 genes24,26,27,30. NUDT21 mRNA levels are significantly reduced in several solid tumours, and low NUDT21 expression levels correlate with a poor prognosis in several cancers26,125,126,127,128. Although global 3′ UTR shortening is widespread in cancer, it has been largely unclear what the shift towards expression of shorter 3′ UTRs means for cancer-specific phenotypes.

Notably, loss of NUDT21 results in 3′ UTR shortening of genes that are enriched in the RAS signalling pathway127. Depletion of NUDT21 enhances RAS-activation phenotypes, such as increased proliferation and migration of glioblastoma cell lines, and induces a RAS-activation multivulva phenotype in C. elegans127,129. Loss of NUDT21 strongly increased the oncogenic phenotype of human colon carcinoma cells with oncogenic RAS gene mutations129. These data suggest that a global APA regulator such as NUDT21 sets up a certain 3′ UTR isoform landscape that may alter the responsiveness of specific genes to additional signals (Fig. 5c). Cooperativity between the underlying 3′ UTR landscape and specific oncogenic pathways likely contributes to the susceptibility of certain cell types to particular oncogenes.

Suppression of NUDT21 also showed a cooperative effect with transcription factors during reprogramming of induced pluripotent stem cells, as suppression of NUDT21 increased reprogramming efficiency by more than tenfold27. Depletion of NUDT21 in mouse embryonic fibroblasts resulted in 3′ UTR shortening of more than 1,500 genes. This 3′ UTR landscape makes cells receptive to the activity of reprogramming transcription factors, and simultaneously increases their resistance to differentiation signals (Fig. 5c).

Such a regulatory pattern suggests that NUDT21 influences the 3′ UTR landscape in a manner similar to the shaping of gene expression by chromatin regulators. Whereas chromatin regulators establish a distinct epigenetic state to influence cell-specific gene expression130, we propose that APA regulators set up a distinct 3′ UTRome, which makes cells resistant or receptive to additional gene or 3′ UTR isoform changes. Cooperativity between the gene expression programme, the underlying APA landscape and the gene-specific changes in the expression of 3′ UTR isoforms determines the resulting phenotype26,27,129 (Fig. 5b,c).

Functions of alternative 3′ UTRs

In this section, we discuss functions of alternative 3′ UTRs, including the regulation of protein output, and translation in specific subcellular environments that enable post-translational modifications or allow the formation of specific protein complexes.

Alternative 3′ UTRs can regulate the expression levels of their encoded proteins through several mechanisms, including modulation of mRNA export, mRNA stability and mRNA translation rates. RNA-binding proteins and microRNAs can differentially bind and regulate transcripts with long and short 3′ UTRs. In cancer cells, some oncogenes can escape suppression by microRNAs through upregulation of short 3′ UTRs lacking the microRNA-binding sites113. Analyses of mRNA stability regulation through APA revealed that for more than two-thirds of genes the short and long 3′ UTR isoforms have a similar mRNA stability41. Of the genes with different stability of the alternative 3′ UTR isoforms, in ~90% of them the long isoform was less stable41. However, APA can have a different effect on mRNA stability and translation in different genes, and several cases were reported where long 3′ UTR isoforms generate substantially more protein.

Regulation of protein abundance

Uncoupling protein 1 (UCP1) is a mitochondrial proton transporter that induces a proton leak across the mitochondrial membrane to shift energy production towards heat production. Cold exposure activates adrenergic signalling and increases UCP1 expression. Northern blot analysis revealed that mouse Ucp1 generates two mRNA isoforms with alternative 3′ UTRs48. It was observed that the long 3′ UTR isoform of Ucp1 contributes only 5–10% of the total Ucp1 mRNA level, but its deletion reduced UCP1 expression by 50–60%. To fully delete the long 3′ UTR isoform of Ucp1, CRISPR–Cas9 was used to eliminate the 3′ UTR downstream of the proximal PAS in mice48. This experiment showed that UCP1 levels are predominantly regulated by translation of the long 3′ UTR isoform, which contains binding sites for the translation regulator cytoplasmic polyadenylation element-binding protein 2 (CPEB2). CPEB2 was required for polysome association of the long 3′ UTR isoform of Ucp1, for low level translation in steady-state conditions and for signalling-induced upregulation of translation in cold conditions or the presence of adrenaline48 (Fig. 6a).

Fig. 6: Examples of functions of alternative 3′ UTRs.
figure 6

a | 3′ untranslated region (UTR)-dependent increase in protein synthesis. Cytoplasmic polyadenylation element-binding protein 2 (CPEB2) binds to the long 3′ UTR (LU) isoform of the uncoupling protein 1 gene (Ucp1), which increases translation efficiency and protein abundance of UCP1. b | 3′ UTR-dependent mRNA localization facilitates synthesis of proteins at their final destination. Shown is a general example of local protein synthesis at the synapse of neurons. c | 3′ UTR-dependent local translation in condensates enables protein complex formation. (A+U)-rich elements represent the binding sites for the RNA-binding protein TIS11B. mRNAs that contain these elements in their 3′ UTRs localize to TIS granules, whereas lack of these elements results in localization to the endoplasmic reticulum (ER) or to the cytoplasm. Local protein synthesis in TIS granules allows the formation of protein complexes that cannot be established upon translation outside this cytoplasmic condensate. Shown is the compartment-dependent assembly of the complex between CD47 and SET, which traffics CD47 more efficiently to the plasma membrane, where it represses phagocytosis. d | 3′ UTR-dependent protein complex assembly determines protein function. A change in 3′ UTR isoform expression of the gene encoding the ubiquitin ligase BIRC3 leads to assembly of a protein complex, which changes BIRC3 function in human cells. 3′-seq, 3′-end sequencing; LU, long 3′ untranslated region; SU, short 3′ untranslated region.

polo mRNA is another example in which loss of long 3′ UTR isoform expression (through deletion of the distal PAS) results in decreased protein abundance45. polo encodes a kinase involved in cell cycle progression in flies. Insufficient Polo expression impairs cell proliferation and causes defects in abdominal wall development, resulting in embryonic lethality45. Interestingly, the same phenotype was observed when the transcript elongation rate of Pol II was reduced45. The slower elongation rate was caused by a mutation in the largest subunit of Pol II and changed APA of polo towards an increase in short 3′ UTR isoform expression at the expense of long 3′ UTR isoform expression. This example illustrates how APA of polo is regulated by changes in transcription elongation rate.

Local translation in neurons

Local protein synthesis in neurons often depends on specific 3′ UTRs, because they contain elements that are necessary for mRNA localization58,60,131,132,133,134,135,136,137,138,139,140. For example, steady-state and signalling-induced local synthesis of mTOR in axons depends on the presence of its 3′ UTR137 (Fig. 6b). Axon injury increases the local production of mTOR and activates mTOR through phosphorylation. Active mTOR then induces local protein synthesis from hundreds of mRNAs that often contain mTOR-regulated elements. Absence of the Mtor 3′ UTR prevents stress-dependent upregulation of local protein synthesis, which leads to neuronal cell death following injury137.

Extracellular signals can promote local protein synthesis of a transcription factor at the synapse, followed by retrograde trafficking to the nucleus of the neuron and activation of a signalling-induced transcriptional programme135,136,141. For example, the 3′ UTR of the gene encoding the mouse chromatin remodeller high mobility group nucleosome-binding domain-containing protein 5 (HMGN5) directs its translation to growth cones of neurons136. Only HMGN5 that was locally synthesized in growth cones was phosphorylated and was able to mediate transcription activation after being trafficked to the nucleus136. In this example, 3′ UTRs are being used to allow translation in a special subcellular environment, such as the growth cone, the properties of which can be controlled by environmental stimuli.

Environmental signals often affect only particular sets of neurons141. This can be accomplished by differential expression of surface receptors or through alternative 3′ UTRs of transcription factors or other regulatory proteins. In the case of cyclic AMP-responsive element-binding protein (CREB), mTOR and HMGN5, only specific 3′ UTR isoforms enable local synthesis of their proteins, which can then respond to environmental signals. As only the neurons that express these 3′ UTR isoforms are able to respond, alternative 3′ UTR isoform expression can establish a selective response to the signals60,135,136,137.

Local translation in condensates

APA also allows mRNA isoforms with specific 3′ UTRs to be translated at membranes, or in biomolecular condensates such as TIS granules10,47,142,143. CD47, which encodes a plasma membrane protein that acts as a phagocytosis-suppression signal, generates mRNA isoforms with alternative 3′ UTRs46. CD47 is translated at the endoplasmic reticulum, but CD47 mRNA with a long 3′ UTR isoform is translated in endoplasmic reticulum-associated TIS granules, a new type of cytoplasmic membraneless organelle that is generated through assembly of the RNA-binding protein TIS11B (also known as ZFP36L1) together with the mRNAs that it binds47. Translation in TIS granules was required for assembly of a protein complex comprising CD47 and the trafficking factor SET, which promotes the localization of CD47 to the plasma membrane, thereby inhibiting phagocytosis of the cell46,47. mRNA localization to TIS granules, followed by local translation, requires (A+U)-rich elements in the long 3′ UTR of CD47 mRNA, which function as TIS11B-binding sites (Fig. 6c). As the short 3′ UTR isoform lacks these motifs, it is not preferentially translated in TIS granules and is unable to interact with SET, leading to diminished protection of the cells from phagocytosis46,47. This example shows that 3′ UTRs control protein complex assembly, which affects protein functions.

Formation of protein complexes

The membraneless processing bodies (P-bodies) are cytoplasmic condensates that concentrate RNA-binding proteins and RNAs. In yeast, P-bodies are nucleated through interactions between 40S ribosomal protein S28-B (Rps28b) and enhancer of mRNA-decapping protein 3 (Edc3)144. Intriguingly, 3′ UTR-dependent protein complex formation enhances assembly of P-bodies. Complex assembly between Rps28B and Edc3 is strongly promoted by the RPS28B 3′ UTR, which binds and recruits Edc3 to the site of Rps28b translation. In the absence of the RPS28B 3′ UTR, the formation of the protein–protein interaction is very inefficient. Importantly, the RPS28B 3′ UTR does not regulate the abundance of Rps28b144.

A similar scenario was observed for the human ubiquitin ligase BIRC3, which is best known as a regulator of cell death49,145. Cells highly expressing BIRC3 are better protected from apoptosis, indicating that this function is regulated by BIRC3 abundance and is mediated through interaction with caspases and factors of the NF-κB pathway49,145. BIRC3 also regulates cell migration through regulation of the abundance of C-X-C chemokine receptor type 4 (CXCR4) at the cell surface. However, only BIRC3 encoded by the mRNA with the long BIRC3 3′ UTR is able to control migration49. Mechanistically, only BIRC3 that was encoded by BIRC3 mRNA with the long 3′ UTR is incorporated in a 3′ UTR-dependent manner into a complex that regulates CXCR4 surface expression (Fig. 6d). Interestingly, BIRC3 abundance was not the determining factor for the control of migration, as leukaemia cells with very low expression of BIRC3, generated by the long 3′ UTR mRNA isoform, were still able to migrate in a long BIRC3 3′ UTR-dependent manner49. These examples illustrate how alternative 3′ UTRs can facilitate the assembly of alternative protein complexes, indicating that 3′ UTRs are integral regulators of alternative protein function.

Novel technologies for studying APA

New sequencing approaches provide unprecedented spatial and temporal resolution of APA isoform expression. Furthermore, research into the functional significance of APA has been catalysed by CRISPR–Cas technologies, which facilitate the manipulation of 3′ UTRs in their endogenous genomic contexts.

APA quantification by scRNA-seq

In the past, only a few laboratories analysed APA, as specialized 3′-end sequencing protocols were required for faithful 3′ UTR isoform quantification3,4,21,40,146. The recent development of 3′-tagged scRNA-seq protocols, together with several new computational pipelines, will change this situation as now publicly available datasets can be used to quantify gene and 3′ UTR isoform expression from basically any cell type5,50,51,52,147. Comparison of the technical variation of APA between previously used bulk 3′-end sequencing protocols and 10x Genomics scRNA-seq datasets found that the new method is substantially more reproducible5. This technological advance has enabled APA analysis at all stages of cell differentiation and revealed global shortening of 3′ UTRs during haematopoiesis and spermatogenesis5,51,52,148,149. These results have challenged the prior notion that differentiation processes generally induce expression of longer 3′ UTRs. Interestingly, APA analysis using single-nucleus RNA-seq of post-mortem brain tissue revealed predominantly longer 3′ UTRs in people with autism in comparison with healthy individuals150,151. These analyses demonstrate how technical improvement may soon enable even more sophisticated studies of APA-related disease phenotypes.

Manipulation of alternative 3′ UTRs

Functional analyses of 3′ UTRs used to rely mainly on reporter gene assays and transient overexpression of plasmid-encoded constructs113,115. Although luciferase or GFP reporters are sensitive tools for measuring expression differences, they are not suitable for evaluating other 3′ UTR-mediated functions8. This bias in experimental readout might have contributed to the notion that 3′ UTRs have evolved mainly as means of gene expression regulation. Furthermore, vector-based reporter systems do not faithfully recapitulate the endogenous control of 3′ UTR isoform expression and function by promoters, enhancers, 5′ UTRs and introns14,36,152,153,154; therefore, whenever possible, functional characterization of 3′ UTRs should be performed in their endogenous genomic contexts. Such approaches have become feasible with several recently developed genome editing technologies.

Shifting APA through use of small hairpin RNAs and ASOs

Small hairpin RNAs or small interfering RNAs that target the unique sequence of long 3′ UTR isoforms can be used to specifically deplete these isoforms49 (Fig. 7a). To downregulate short 3′ UTR isoforms, modified antisense oligonucleotides (ASOs) that bind to sequences surrounding proximal PASs can be used. These ASOs can physically block access of the CPA machinery to the nascent RNA, thus preventing 3′-end formation at the blocked PAS155,156,157 (Fig. 7b). Similarly, ASO targeting of intronic PASs was used to shift mRNA processing towards splicing of the intron, resulting in increased expression of the full-length protein152.

Fig. 7: Manipulation of alternative 3′ UTR expression.
figure 7

a | Small hairpin RNAs or small interfering RNAs (siRNAs) targeting the extended part of 3′ untranslated region (UTR) isoforms exclusively downregulate the expression of long 3′ UTR (LU) isoforms. b | Antisense oligonucleotides (ASOs) that are complementary to a specific proximal polyadenylation site (PAS) prevent binding of the cleavage and polyadenylation (CPA) machinery and exclusively downregulate expression of short 3′ UTR (SU) isoforms. c | CRISPR–iPAS. An enzymatically dead Cas13 (dCas13) is targeted by a guide RNA (dCas13 RNP) to a region slightly upstream of a PAS of interest, where it prevents CPA factors from recognizing the PAS. d | CRISPRpas. An enzymatically dead Cas9 (dCas9) is targeted by a guide RNA to a region downstream of a PAS of interest, where it acts as a roadblock for RNA polymerase II (Pol II) elongation, thereby increasing CPA at the PAS. e | Permanent removal of a specific PAS or entire 3′ UTR isoforms using a pair of Cas9 RNPs to delete the region of interest. f | PAS mutagenesis through homologous recombination can be achieved by CRISPR–Cas9-mediated DNA cleavage in the presence of a homologous repair template that contains the mutation. Cas RNP, Cas protein–guide RNA complex; KO, knockout.

Shifting APA using CRISPR–Cas systems

PAS choice can be manipulated using CRISPR–Cas systems. For example, with CRISPR–iPAS, site-specific recruitment of catalytically dead RNA-specific Cas13 to proximal PASs seems to prevent access of CPA factors and shifts APA towards distal PAS use158 (Fig. 7c). Alternatively, with CRISPRpas, proximal PAS use is enhanced through recruitment of the DNA-specific catalytically dead Cas9 to sequences downstream of the PAS54 (Fig. 7d). It is thought that the Cas9 roadblock leads to pausing and disengagement of the transcribing Pol II complex, which appears to be particularly suitable for increasing the use of weak PASs.

In addition to the above-mentioned tools for transient modulation of PAS use, the CRISPR–Cas technology can also permanently change APA. CRISPR–Cas9 was used to insert exogenous PASs into 3′ UTRs. For example, a preprint reports overexpression of the neurotrophic factor GDNF in mice by introduction of a strong PAS downstream of its stop codon159. A similar approach in induced pluripotent stem cells serves as example for therapeutic genome editing160. Myotonic dystrophy type I is caused by a toxic CUG repeat expansion in the 3′ UTR of DMPK. CRISPR–Cas-mediated insertion of two strong PASs upstream of the repeat permanently eliminated transcripts with the repeat expansion and led to reversal of the disease phenotype in skeletal muscle fibres derived from induced pluripotent stem cells160.

Deletion of 3′ UTRs or individual PASs using CRISPR–Cas

Several laboratories have used pairs of guide RNAs to generate defined genomic deletions to eliminate entire 3′ UTRs or specific PASs in cell lines or model organisms29,48,53,64,75,154,161,162 (Fig. 7e). In a related approach, when a homologous DNA template is provided, individual point mutations can be introduced into the PAS hexamer to generate precise sequence perturbations75 (Fig. 7f).

If the sequence context of the PAS contains a site for guide-RNA binding, a single guide RNA–Cas9 complex can be used to disrupt an individual PAS. DNA polymerase-η (Polη), which is encoded by POLH, performs DNA translesion synthesis and contributes to drug resistance in cancer; POLH deletion prevents the drug resistance163. POLH generates three 3′ UTR isoforms, and single guide RNAs were used to disrupt either the proximal PAS or the distal PAS. Each disruption had substantial effects on Polη expression, nearly fully recapitulating the effects of the gene deletion163. This result suggests that elimination of an individual PAS of a multi-UTR gene may be a strategy for downregulating gene and protein expression in experimental or therapeutic settings.

Screens for functional elements in 3′ UTRs

CRISPR–Cas technology was also applied to functionally interrogate 3′ UTR sequences. A proof-of-concept study in Drosophila melanogaster showed that tiling libraries of guide RNAs targeting a 3′ UTR can be used to identify cis-regulatory elements164. However, owing to redundancy of regulatory elements, small insertions or deletions in non-coding sequences often have limited effects on gene expression. Tiling screens in organisms with large genomes, such as humans or mice, currently do not provide sufficient coverage because of limited availability of unique targeting sequences. However, a recent innovative approach overcame the limitation of low coverage55. The study authors screened C. elegans for mutations in non-coding regulatory regions, including 3′ UTRs, that are associated with fitness or morphological phenotypes. Through random pairing of guide RNAs in their regions of interest, they generated many diverse sets of deletions. Deletion mapping in worms with specific phenotypes generated a mutagenesis profile and allowed the identification of hotspots with regulatory potential55. This novel approach identifies non-coding cis-regulatory elements and their interdependency.

APA therapeutics

The implication of aberrant 3′ UTR isoform expression patterns in human disease, in particular cancer, has created a strong interest in manipulating APA pharmacologically. In this section, we discuss new chemical compounds and sequence-based therapeutics that have the potential to be used clinically to correct both global and gene-specific APA dysregulation.

Compounds that shift APA

With the discovery of disease-associated changes in 3′ UTRs26,33,34,111,129,151, APA deregulation presents a new target for therapeutic intervention. Reporter assays identified compounds that broadly shift APA towards use of proximal PASs165. In addition to new molecules, the compounds included topoisomerase inhibitors that were previously reported to modulate APA166. This result suggests that some of the currently used drugs may have a substantial impact on APA. However, so far, large-scale studies that evaluate drug-induced changes in APA are lacking.

The recent development of computational methods to analyse APA from scRNA-seq data may help to identify small molecules that predominantly target isoform expression instead of gene expression. For example, the compound JTE-607 was developed two decades ago, and in preclinical studies caused selective lethality in acute leukaemia and Ewing sarcoma cells167. Recently, it was recognized that the small molecule binds CPSF3, the endonuclease of the CPSF complex, which causes extensive readthrough transcription18,167.

Other known drugs, including molecules that alter DNA structure, chromatin or Pol II processivity, such as inhibitors of topoisomerases, histone deacetylases, or CDK12 and CDK13, were shown to cause shifts in APA33,34,80,166,168. However, in most cases, the mechanism of action by which these compounds alter APA is not known. Owing to their DNA damage-inducing capacity, these compound classes are used as anticancer drugs, but it is currently unclear whether the changes in APA enhance or counteract the intended treatment effects170.

RNA therapeutics

A more gene-specific therapeutic approach involves the use of ASOs to modulate APA in specific disease genes. Promising proof-of-concept studies in neuroblastoma, prostate cancer and muscular dystrophy effectively suppressed PAS use in cell culture and in mice152,155,156,157. For example, ASOs targeting the only PAS in DUX4 diminish overall transcript maturation and can be used to correct DUX4 overexpression in muscular dystrophy155,156.

DNA-based and RNA-based strategies for vaccines, cancer immunotherapies and gene replacement therapies are intensely explored. For example, 3′ UTR sequences optimized for stability have been identified and used in the Pfizer–BioNTech mRNA vaccine against severe acute respiratory syndrome coronavirus 2 (ref.169). However, better knowledge of the function of 3′ UTR-dependent gene regulation will inspire improved sequence design to include isoform-specific and cell compartment-specific targeting.

Conclusion and future perspective

In this Review, we have discussed how the integration of different regulatory mechanisms gives rise to highly specific and dynamic APA landscapes. Further research is required to gain more insights into gene-specific isoform regulation in different conditions, which could help to address some remaining open questions, such as how the activity of APA-regulating enzymes is confined to particular gene loci.

Although only a handful of APA events have been thoroughly investigated on a functional level, they provide a glimpse of the potential contribution of APA regulation to phenotypic complexity. Resolving the expression patterns of 3′ UTR isoforms will help to elucidate gene functions and their impact on human health. Importantly, changes in APA can also modify the cellular response to external cues. Although it is known that cancer is frequently characterized by 3′ UTR shortening, the extent of cooperativity between APA and oncogenic pathways needs to be further characterized, which could lead to the development of new therapeutic approaches that target APA regulators.