A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene
- First Online:
- Cite this article as:
- Yano, Y., Saito, R., Yoshida, N. et al. J Mol Med (2004) 82: 414. doi:10.1007/s00109-004-0550-3
- 284 Views
We have earlier generated a mutant mouse in a course of making a transgenic line that exhibited interesting heterozygote phenotypes, which exhibited failure to thrive, severe bone deformities, and polycystic kidneys. This mutant mouse provided a clue to uncover a unique role of expressed pseudogenes. In this mutant the transgene was integrated into the vicinity of the expressing pseudogene of Makorin1 called Makorin1-p1. This insertion reduced transcription of the Makorin1-p1, resulting in destabilization of the Makorin1 mRNA in trans via a cis-acting RNA decay element within the 5′ region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. These findings demonstrate a novel and specific regulatory role of an expressed pseudogene as well as functional significance for noncoding RNAs. Next, we developed an original algorithm to determine how many pseudogenes are expressed. Based on our examination 2–3% of human processed pseudogenes are expressed using the most strict criteria. Interestingly, the mouse has a much smaller proportion of expressed pseudogenes (0.5–1%). Pseudogenes are functionally less constrained, and have accumulated more mutations than translated genes. If they have some functions in gene regulation, this property would allow more rapid functional diversification than protein-coding genes. In addition, some genetic phenomena that exhibit incomplete penetrance might be attributed to “mutation” or “variation” of pseudogenes.
KeywordsPseudogenencRNAmRNA decayJunk DNAEvolution
Expressed sequence tag
The complete sequencing of the genomes of eukaryotes and prokaryotes allows the examination and comparison of the whole genome and proteome of these species [1, 2, 3, 4, 5, 6, 7]. Proteome analysis has revealed that the total number of protein domain sequence families appears to vary much less between organisms than the overall proteome size. For example, yeast, worm, fruit fly, and human seem to contain similar proteome sizes despite the wide range of differences of annotated genes [2, 8, 9, 10, 11]. In other words, extensive redundancy of gene families is observed at the individual gene level. How has this redundancy been generated during evolution? The major contributing mechanism of the gene diversification appears to be duplication of the genome, which provides an additional copy of each gene and a new gene by exon shuffling. The features and frequencies of duplication vary in the organisms: yeast and some plants display a high frequency of segmental chromosome duplication , whereas in the human genome there is much higher occurrence of local chromosome duplication . Although genome duplication plays an important role in the generation of gene families or new genes, chromosome rearrangements sometimes disable gene function by disruption of gene structure or regulatory regions or by the subsequent accumulation of modifications including mutations, insertions, deletions, and frame shifts. These disabled copies of genes or decayed remnants of genes that do not produce full-length proteins are duplicated or nonprocessed pseudogenes [13, 14]. Nonprocessed pseudogenes are observed ubiquitously in various species since they appear to be generated by genomic duplication.
Another type is processed pseudogenes which are generated by reverse transcription of an mRNA transcript with subsequent reintegration of the cDNA into the genome. In contrast to nonprocessed pseudogenes, processed pseudogenes display several features including an intronless structure, the presence of a poly-A tail, and terminal duplication. The apparent coding frame of the processed pseudogenes acquires modifications including mutations, insertions, deletions, frame shifts, resulting in a functionally inactive pseudogene. Processed pseudogenes have been observed only in metazoan animals and flowering plants and presumably arise from mRNA transcripts in the germ-line cell lineage [13, 14]. In humans they are probably made as a by-product of long interspersed nuclear element retrotransposition [15, 16, 17]. It is noteworthy that the human genome contains close to 10,000 processed pseudogenes [13, 14, 18]. Interestingly, the proportion of pseudogenes in the human genome is much higher than in that of other organisms [19, 20, 21] (http://bioinfo.mbb.yale.edu/genome/pseudogene/).
Pseudogenes are thought to be important sequences for the study of molecular evolution as “molecular fossils.” They provide a record of how genomic DNA has been changed without evolutionary pressure and has be used as a model for determining the underlying rates of nucleotide substitution, insertion, and deletion in the greater genome [22, 23, 24]. However, accumulating evidence indicates that at least some pseudogenes are expressed, suggesting that they may have a functional role beyond their importance as molecular fossils.
Expressed pseudogenes are part of a broader class of of regulatory, noncoding RNAs (nc-RNAs) that have recently been identified [25, 26, 27]. Of varying lengths, nc-RNAs have no long open reading frame. While not encoding proteins, they may act as riboregulators, and their main function may be the posttranscriptional regulation of gene expression. Many ncRNAs have been identified and characterized both in prokaryotes and eukaryotes and are involved in the specific recognition of cellular nucleic acid targets through complementary base pairing, controlling cell growth, and differentiation. Some are associated with the abnormalities in imprinted inheritance that occur in several well-known developmental and neurobehavioral disorders [28, 29, 30, 31, 32]. Originally these regulatory mechanisms were probably established to protect organisms from virus infection. However, recent examples of posttranscriptional regulation of gene expression by such ncRNAs further suggest that expressed pseudogenes have some functional importance.
We reported an example of an expressed pseudogene as a regulator of mRNA stability by its homologous coding gene . In this study we uncovered a unique role of an expressed pseudogene for the regulation of mRNA stability in a transgene insertional mouse mutant exhibiting polycystic kidneys and bone deformities. In this mutant the transgene is integrated into the vicinity of the expressed pseudogene of Makorin1 called Makorin1-p1. This insertion reduces transcription of the Makorin1-p1 pseudogene, resulting in destabilization of the Makorin1 mRNA in trans via a cis-acting RNA decay element within the 5′ region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. Either Makorin1 or Makorin1-p1 transgenes rescue these phenotypes. These findings demonstrate a novel and specific regulatory role of an expressed pseudogene, as well as functional significance for ncRNAs. In addition, we devised a new algorithm to search expressed pseudogenes from expressed sequence tag (EST) data bases. Through this analysis we found that a substantial proportion of processed pseudogenes are expressed, and that the population of expressed pseudogenes is significantly higher in the human than in the mouse. These results suggest that expression of pseudogenes contributes to the complexity of gene regulation in the higher organisms.
Example of active role of expressed pseudogene
We performed genetic examination, and demonstrated that the phenotype was transmitted by genomic imprinting fashion . We further examined expression of the transgene by northern blot to determine whether the transmission pattern of the transgene influences its expression. There was no difference in transgene expression regardless of transmission origin of the transgene (Fig. 1D). Although we made several transgenic lines and confirmed expression of the transgene in all lines, only one particular line exhibited these pleiotropic phenotypes. Thus we conclude that these phenotypes are likely attributed to gene inactivation by transgene insertion rather than to a function of the transgene.
Identification of the mutated gene
General examination of expressed pseudogenes of a processed type
Next, information on homology of each sequence to known proteins was obtained. Then each sequence was subjected to BLAST search  against GenBank EST database (human or mouse, E value=e-50). For the accurate assessment of the homology the most significant hit in EST was subjected to further analysis. In particular, local alignment using program search from the FASTA package [38, 39] was invoked to calculate identity and overlap (length of homologous region) between pseudogene and the EST sequence. It is often difficult to distinguish whether sequence mismatches in the alignment are due to sequence divergences or sequencing errors. On the other hand, many sequencing errors are located in the sequence terminal regions. To consider these errors we identified “consecutive match regions” where the regions are aligned with no mismatch. High percentage (ex. >90%) of the longest consecutive match region in the pseudogene sequence implies that pseudogene and EST sequences are very homologous, and that sequence mismatches are located in the sequence terminal region where there may be many sequencing errors. Finally for each sequence we obtained identity to known protein and EST sequence, overlap to EST sequence, length of the longest consecutive matches, and percentage of the longest consecutive match region in the pseudogene. We set several thresholds for these parameters, and pseudogenes satisfying the thresholds which were assumed to be expressed.
Estimated number of expressed pseudogenes based on several criteria: thresholds for parameters to predict pseudogenes (columns 1–4) and number and percentage of pseudogenes which satisfy the given thresholds (columns 5–8)
Identity to known protein
Identity to EST sequence
Length of longest consecutive match region
Percentage of longest consecutive match region
Number of sequences satisfying the thresholds
Percentage of sequences satisfying the thresholds (%)
Expressed pseudogene stabilizes mRNA by it’s homologues region
Modulation of mRNA stability provides a powerful means for controlling gene expression during cell growth and differentiation as well as other physiological transitions [40, 41]. Cytoplasmic mRNA stability is regulated in at least two different ways. The first is the unique differential and selective decay rates characteristic of different mRNAs. A second level of regulation involves changes in the stability of a given mRNA in response to a wide variety of extracellular stimuli. To unravel the underlying processes of regulated mRNA turnover with precision a detailed analysis of the major components and mechanistic steps involved in mRNA turnover is required, for example, the cellular and extracellular stimuli or signals that trigger mRNA decay, cis-acting elements, enzymes, and other auxiliary trans-acting factors.
The 5′ cap structure is an important determinant of the stability of all messages. The 5′-5′ triphosphate linkage that characterizes this structure makes the body of the mRNA intrinsically resistant to degradation by 5′ to 3′ exonucleases. Translation initiation factor eIF4F, a multisubunit factor comprising eIF4G, eIF4A, and eIF4E, plays an important role in the decapping regulation. The eIF4E subunit crosslinks specifically to the cap structure and binds the cap-affinity columns. This factor is therefore expected to have a negative effect on the accessibility of the cap structure to the Dcp1p/Dcp2p complex [42, 43]. For several transcripts, including both stable and unstable mRNAs, decapping does not normally occur until the poly(A) tail is shortened to an oligo(A). Accordingly, loss of the poly(A) tail triggers subsequent mRNA degradation.
The de-polyadenylation of these mRNAs depends at least in part upon specific cis-acting elements found either in the coding region or, more frequently, in the 3′ UTR [44, 45, 46]. The 3′ UTR cis-acting destabilizing elements can be quite variable in sequence and length, but some are characterized by AU-rich regions (AUUUArepeats). Many proto-oncogenes, cytokines, and lymphokines including c-fos, c-myc, IL4, IL6, GM-CSF, TNF-α, and IL3 are carrying this cis-acting element. Instability elements can also be found in the coding region of mRNAs. For example, two destabilizing regions within the c-fos protein coding region, termed CRD-1 and CRD-2, have been identified, and CRD-1 is the major determinant (mCRD) [44, 45]. Specifically inhibiting the translation of a reporter mRNA bearing either the entire c-fos protein coding region or only the mCRD by insertion of a stable stem-loop upstream of the translation initiation codon led to full stabilization of the message [45, 46]. Protein complexes bind these regions and control subsequent deadenylation which triggers mRNA decay. Therefore, given that regions of expressed pseudogenes overlap with such regulatory region, they could protect mRNA from the degradation machinery.
Posttranscriptional gene regulation by nc-RNA
nc-RNA is defined as RNA molecules which function directly as structural, catalytic, or regulatory RNAs rather than expressing mRNAs that encode proteins [25, 26, 27]. Presence of nc-RNA had been recognized since the 1950s. Since most of cellular RNA were found in discrete particles in the cytoplasm , which were later shown to be the site of protein synthesis and called ribosomes . Another class of functional RNA was predicted by Francis Crick’s “adaptor” hypothesis . These RNAs later proved to be Crick’s adaptors: the transfer RNAs. Several abundant, small nc-RNAs other than rRNA and tRNA were detected and isolated biochemically, among them the uridine (U)-rich U RNAs [49, 50]. Many of these small RNAs are associated with proteins to form ribonucleoprotein (RNP) complexes . Characterization of small RNPs was aided by the discovery that certain patients with autoimmune diseases, such as systemic lupus erythematosus, produce anti-RNP autoantibodies that could be used to immunoprecipitate small RNPs . Many of the abundant small RNPs precipitated by these antisera, namely U1, U2, U4, U5, and U6 small nuclear RNA (snRNA), turned out to be components of the spliceosome, involved in splicing mRNAs [51, 53].
New nc-RNAs continue to appear; among the more fascinating stories is the discovery that RNAs have roles in chromatin structure . A typical example is the human XIST (X-inactive-specific transcript) RNA which encodes a 17-kb nc-RNA with a key role in dose compensation and X-chromosome inactivation . Drosophila melanogaster also seems to control dosage compensation using small chromatin-associated roX (RNA on the X) RNAs . Interestingly, several large nc-RNAs have been clustered in the imprinted regions of vertebrate chromosomes, including the IPW (imprinted in Prader-Willi syndrome) and H19 (H19, imprinted maternally expressed untranslated mRNA) transcripts [57, 58]. The imprinted Prader-Willi crucial region seems to be especially rich in nc-RNAs [59, 60]. Many of these other RNAs are cis-antisense RNAs that overlap coding genes on the other genomic strand. These nc-RNAs are not limited within the imprinting region of mammals. Various cis-antisense RNAs have been observed in prokaryotes , plants  and animals , and their roles are unlikely to be limited to those in imprinting and chromatin structure.
Recently two classes of nc-RNAs which can play important regulatory roles in animals and plants by targeting mRNAs for cleavage or translational repression are introduced to explore gene function in vivo. The story of this new strategy for regulating gene expression in vivo had begun from the identification of lin-4 (lineage-abnormal-4) and let-7 (lethal-7) RNAs in C. elegans [64, 65]. The lin-4 and let-7 RNAs are unusual because they are expressed as 22-nt RNAs, having been processed from approx. 70-nt precursor hairpins. MicroRNAs (miRNAs) are endogenously encoded small nc-RNAs, derived by processing of short RNA hairpins, that can inhibit the translation of mRNAs bearing partially complementary target sequences. In contrast, small interfering RNAs (siRNAs), which are derived by processing of long double-stranded RNAs and are often of exogenous origin, degrade mRNAs bearing fully complementary sequences. It seems that miRNAs are more likely to function as translational repressors not as siRNAs in directing mRNA degradation. RNAi has been suggested to function as a primitive immune system against RNA viruses and retrotransposons [66, 67].
The discovery of RNA catalysis and the “RNA world” hypothesis for the origin of life provide a plausible explanation for why rRNA and tRNA are at the core of the translation machinery: Perhaps, they are the frozen evolutionary relic of the invention of the ribosome by an RNA-based “ribo-organism.” Other known nc-RNAs have also been proposed to be ancient relics of the last ribo-organisms. Many functional roles do not require the more sophisticated catalytic prowess of proteins and could be carried out by simple RNAs. nc-RNAs are often found to have roles that involve sequence specific recognition of another nucleic acid. Posttranscriptional regulation in particular can be achieved simply by steric occlusion of sites on a target pre-mRNA or mature RNA. Thus nc-RNA would be well adapted for regulatory roles as the mastermind in the “DNA-protein world.” Expressed pseudogenes might be playing an important role as an actor/actress in that stage.
We have uncovered a unique role of an expressed pseudogene as a regulator of mRNA stability. An active role of expressed pseudogenes has been observed in other organisms. For example, neuronal expression of neural nitric oxide synthase protein is suppressed by an antisense RNA transcribed from a nitric oxide synthase pseudogene . Therefore posttranscriptional gene regulation by expressed pseudogenes might be a general regulatory phenomenon.
In the last few years extensive data have accumulated which show that in different cells various RNA transcripts are synthesized. Some of them lack protein coding capacity and seem to act at the RNA level. Initially, the biological role of those ncRNA was mysterious. Recent whole-genome sequencing efforts combined with transcription profiling and bioinformatics strategies have led to the notion that ncRNAs play important roles in the regulation of mRNA stability. Further analysis of such ncRNAs, including expressed pseudogenes, will likely lead to further understanding of the layers of complexity used in biological organisms to exquisitely regulate gene expression.
We thank Dr. Masami Muramatsu, Dr. Munehisa Ueno, Dr. Nobuhiro Deguchi, and Dr. Yoshihiko Funae for generous support. We also thank to Tomohito Itoh, Takumi Matsumoto, Shinji Sasaki, Michiyo Ishida, and Yuzuru Yamauchi for technical support.