Journal of Molecular Medicine

, Volume 82, Issue 7, pp 414–422

A new role for expressed pseudogenes as ncRNA: regulation of mRNA stability of its homologous coding gene

Authors

  • Yoshihisa Yano
    • Department of Genetic Disease ResearchOsaka City University Graduate School of Medicine
  • Rintaro Saito
    • Institute for Advanced BiosciencesKeio University
  • Noriyuki Yoshida
    • Department of Chemical BiologyOsaka City University Graduate School of Medicine
  • Atsushi Yoshiki
    • Experimental Animal Division, Department of Biological Systems, BioResource CenterRIKEN Tsukuba Institute
  • Anthony Wynshaw-Boris
    • Departments of Pediatrics and Medicine, UCSD Cancer Center, San Diego School of MedicineUniversity of California
  • Masaru Tomita
    • Institute for Advanced BiosciencesKeio University
    • Department of Genetic Disease ResearchOsaka City University Graduate School of Medicine
Review

DOI: 10.1007/s00109-004-0550-3

Cite this article as:
Yano, Y., Saito, R., Yoshida, N. et al. J Mol Med (2004) 82: 414. doi:10.1007/s00109-004-0550-3

Abstract

We have earlier generated a mutant mouse in a course of making a transgenic line that exhibited interesting heterozygote phenotypes, which exhibited failure to thrive, severe bone deformities, and polycystic kidneys. This mutant mouse provided a clue to uncover a unique role of expressed pseudogenes. In this mutant the transgene was integrated into the vicinity of the expressing pseudogene of Makorin1 called Makorin1-p1. This insertion reduced transcription of the Makorin1-p1, resulting in destabilization of the Makorin1 mRNA in trans via a cis-acting RNA decay element within the 5′ region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. These findings demonstrate a novel and specific regulatory role of an expressed pseudogene as well as functional significance for noncoding RNAs. Next, we developed an original algorithm to determine how many pseudogenes are expressed. Based on our examination 2–3% of human processed pseudogenes are expressed using the most strict criteria. Interestingly, the mouse has a much smaller proportion of expressed pseudogenes (0.5–1%). Pseudogenes are functionally less constrained, and have accumulated more mutations than translated genes. If they have some functions in gene regulation, this property would allow more rapid functional diversification than protein-coding genes. In addition, some genetic phenomena that exhibit incomplete penetrance might be attributed to “mutation” or “variation” of pseudogenes.

Keywords

PseudogenencRNAmRNA decayJunk DNAEvolution

Abbreviations

EST

Expressed sequence tag

RNP

Ribonucleoprotein

UTR

Untranslated region

Introduction

The complete sequencing of the genomes of eukaryotes and prokaryotes allows the examination and comparison of the whole genome and proteome of these species [1, 2, 3, 4, 5, 6, 7]. Proteome analysis has revealed that the total number of protein domain sequence families appears to vary much less between organisms than the overall proteome size. For example, yeast, worm, fruit fly, and human seem to contain similar proteome sizes despite the wide range of differences of annotated genes [2, 8, 9, 10, 11]. In other words, extensive redundancy of gene families is observed at the individual gene level. How has this redundancy been generated during evolution? The major contributing mechanism of the gene diversification appears to be duplication of the genome, which provides an additional copy of each gene and a new gene by exon shuffling. The features and frequencies of duplication vary in the organisms: yeast and some plants display a high frequency of segmental chromosome duplication [12], whereas in the human genome there is much higher occurrence of local chromosome duplication [2]. Although genome duplication plays an important role in the generation of gene families or new genes, chromosome rearrangements sometimes disable gene function by disruption of gene structure or regulatory regions or by the subsequent accumulation of modifications including mutations, insertions, deletions, and frame shifts. These disabled copies of genes or decayed remnants of genes that do not produce full-length proteins are duplicated or nonprocessed pseudogenes [13, 14]. Nonprocessed pseudogenes are observed ubiquitously in various species since they appear to be generated by genomic duplication.

Another type is processed pseudogenes which are generated by reverse transcription of an mRNA transcript with subsequent reintegration of the cDNA into the genome. In contrast to nonprocessed pseudogenes, processed pseudogenes display several features including an intronless structure, the presence of a poly-A tail, and terminal duplication. The apparent coding frame of the processed pseudogenes acquires modifications including mutations, insertions, deletions, frame shifts, resulting in a functionally inactive pseudogene. Processed pseudogenes have been observed only in metazoan animals and flowering plants and presumably arise from mRNA transcripts in the germ-line cell lineage [13, 14]. In humans they are probably made as a by-product of long interspersed nuclear element retrotransposition [15, 16, 17]. It is noteworthy that the human genome contains close to 10,000 processed pseudogenes [13, 14, 18]. Interestingly, the proportion of pseudogenes in the human genome is much higher than in that of other organisms [19, 20, 21] (http://bioinfo.mbb.yale.edu/genome/pseudogene/).

Pseudogenes are thought to be important sequences for the study of molecular evolution as “molecular fossils.” They provide a record of how genomic DNA has been changed without evolutionary pressure and has be used as a model for determining the underlying rates of nucleotide substitution, insertion, and deletion in the greater genome [22, 23, 24]. However, accumulating evidence indicates that at least some pseudogenes are expressed, suggesting that they may have a functional role beyond their importance as molecular fossils.

Expressed pseudogenes are part of a broader class of of regulatory, noncoding RNAs (nc-RNAs) that have recently been identified [25, 26, 27]. Of varying lengths, nc-RNAs have no long open reading frame. While not encoding proteins, they may act as riboregulators, and their main function may be the posttranscriptional regulation of gene expression. Many ncRNAs have been identified and characterized both in prokaryotes and eukaryotes and are involved in the specific recognition of cellular nucleic acid targets through complementary base pairing, controlling cell growth, and differentiation. Some are associated with the abnormalities in imprinted inheritance that occur in several well-known developmental and neurobehavioral disorders [28, 29, 30, 31, 32]. Originally these regulatory mechanisms were probably established to protect organisms from virus infection. However, recent examples of posttranscriptional regulation of gene expression by such ncRNAs further suggest that expressed pseudogenes have some functional importance.

We reported an example of an expressed pseudogene as a regulator of mRNA stability by its homologous coding gene [33]. In this study we uncovered a unique role of an expressed pseudogene for the regulation of mRNA stability in a transgene insertional mouse mutant exhibiting polycystic kidneys and bone deformities. In this mutant the transgene is integrated into the vicinity of the expressed pseudogene of Makorin1 called Makorin1-p1. This insertion reduces transcription of the Makorin1-p1 pseudogene, resulting in destabilization of the Makorin1 mRNA in trans via a cis-acting RNA decay element within the 5′ region of Makorin1 that is homologous between Makorin1 and Makorin1-p1. Either Makorin1 or Makorin1-p1 transgenes rescue these phenotypes. These findings demonstrate a novel and specific regulatory role of an expressed pseudogene, as well as functional significance for ncRNAs. In addition, we devised a new algorithm to search expressed pseudogenes from expressed sequence tag (EST) data bases. Through this analysis we found that a substantial proportion of processed pseudogenes are expressed, and that the population of expressed pseudogenes is significantly higher in the human than in the mouse. These results suggest that expression of pseudogenes contributes to the complexity of gene regulation in the higher organisms.

Example of active role of expressed pseudogene

In the course of making a transgenic mouse carrying the Drosophila sex-lethal gene, we incidentally found in one line of transgenic mice that approximately 80% of heterozygotes die within 2 days of birth. Survivors of this critical period failed to thrive and exhibited severe bone deformities (Fig. 1A). All heterozygotes exhibited reduced mineralization. The cortex of tibial bone was remarkably attenuated, and its trabecullae were delicate strands with disorganization. These histological and macroscopic features are similar to those found in osteogenesis imperfecta. Mutant heterozygotes also developed progressive renal polycystic dilation (Fig. 1B), with microscopic liver cysts (data not shown). Finally, the epithelial cover on the eyes of mutant mice was not completed (Fig. 1C), resulting in an open eye phenotype in later stage embryos.
Fig. 1

Phenotypes of the mutant generated by insertional mutagenesis. A Macroscopic findings of wild-type (+/+) or transgenic (+/−) littermates. B Histological examination of kidney (hematoxylin-eosin) at several developmental stages (+/−) and control (+/+) mice. Progressive cystic dilatation was accompanied by eosinophilic droplet in the cell was observed in the heterozygote. C At P0, all heterozygotes (+/−) exhibited loss of eye-closure seen in wild-types (+/+). D Examination of transgene expression by northern blot. A cDNA fragment of sex-lethal was used to detect expression of the transgene. Total mRNA was extracted from E15.5 mouse embryo. The result indicated that there is no differential expression of the transgene associated with transmission pattern

We performed genetic examination, and demonstrated that the phenotype was transmitted by genomic imprinting fashion [33]. We further examined expression of the transgene by northern blot to determine whether the transmission pattern of the transgene influences its expression. There was no difference in transgene expression regardless of transmission origin of the transgene (Fig. 1D). Although we made several transgenic lines and confirmed expression of the transgene in all lines, only one particular line exhibited these pleiotropic phenotypes. Thus we conclude that these phenotypes are likely attributed to gene inactivation by transgene insertion rather than to a function of the transgene.

Identification of the mutated gene

The chromosome location of the transgene was mapped in radiation hybrids, demonstrating that the transgene is tightly linked to D5Mit237, a location confirmed by fluorescent in situ hybridization. The genome structure of the transgene locus was characterized (Fig. 2A). One copy of each of two transgenes was integrated into the genome in a head-to-head fashion, accompanied by a 7-kb deletion. Three potential transcription unites were identified through characterization of the transgene insertion site. Each candidate was tested for imprinted expression. Northern blot indicated that either transcript1 or deoxycytidine kinase (DCK) was not imprinted. Then we examined the expression of Makorin1-p1, which is a pseudogene of Makorin1. Makorin1 was originally identified as a gene with strong homology with ZNF127, which is located in the critical region of the Prader-Willi syndrome [34, 35]. Two kinds of transcripts of Makorin1 are generated by alternative polyadenylation and contain different 3′ untranslated regions (UTRs) homologs, although the protein coding regions are identical. Surprisingly, the short form of Makorin1 as well as Makorin1-p1 were significantly reduced in a heterozygous embryo derived from the mating between a mutant male and a wild type female (±) compared to a wild type embryo. We further confirmed expression of Makorin1-p1 by northern blot using oligonucleotides (Fig. 2B). Thus we concluded that Makorin1-p1 is transcribed in wild-type animals and is reduced in expression in the mutant. We further confirmed imprinted expression of Makorin1-p1 by mating with wild M. spretus and M. molossinus mice, since there was no detectable polymorphisms identified among laboratory mice in the expressed region [33]. Curiously, we were able to find polymorphisms outside of the expressed region, suggesting conservation pressure on the expressed region.
Fig. 2

Identification of the mutated gene. A Schematic presentation of the genomic organization of the transgene insertion locus, illustrating the transgene, relative orientation of transcripts from three candidate genes and the probes used for screening of BAC library. B Comparison of mRNA amount of Makorins by northern blot. To confirm and compare the amounts of each transcript we performed northern blot analysis for mRNA extracted from E15.5 embryo using oligo-nucleotides which were synthesized from the common region of Makorin1 and Makorin1-p1 (a) or from the specific region of Makorin1-p1 (b, c). Above Sequence of oligo-nucleotide; left transcript. Total RNA (T) or poly-A enriched RNA (A) was shown on the top

General examination of expressed pseudogenes of a processed type

This example suggested that other expressed pseudogenes have similar function. Therefore we devised an original algorism to extract expressed pseudogenes from public database. The basic strategy to estimate the number of expressed pseudogenes is to perform homology searches for each pseudogene sequence against EST databases. First, 7,868 human and 4,476 mouse sequences predicted as processed pseudogenes were downloaded from http://bioinfo.mbb.yale.edu/genome/pseudogene/. The method to predict them is based on comparative analysis of protein and genome sequences [4, 36]. To obtain insight of pseudogene function we characterized conservation of pseudogenes between the human and the mouse. The number of mouse pseudogenes which are homologous to human pseudogenes varies as we change the E value to find homlogs. The number is also influenced by the identity thresholds (Fig. 3). If we set the E value to e-30 or e-50, the rates saturate at 20–35% around identity=0.7. Therefore we suggest that 20–35% is one of the appropriate estimates of the rate of conserved pseudogenes between human and mouse.
Fig. 3

Rate of human pseudogenes whose homologs are found in mouse (human to mouse) and vice versa (mouse to human). Three BLAST E values (e-10, e-30, and e-50) were used to find homologous pseudogenes. Then rate of pseudogenes having their homologs with identity greater than given threshold is plotted

Next, information on homology of each sequence to known proteins was obtained. Then each sequence was subjected to BLAST search [37] against GenBank EST database (human or mouse, E value=e-50). For the accurate assessment of the homology the most significant hit in EST was subjected to further analysis. In particular, local alignment using program search from the FASTA package [38, 39] was invoked to calculate identity and overlap (length of homologous region) between pseudogene and the EST sequence. It is often difficult to distinguish whether sequence mismatches in the alignment are due to sequence divergences or sequencing errors. On the other hand, many sequencing errors are located in the sequence terminal regions. To consider these errors we identified “consecutive match regions” where the regions are aligned with no mismatch. High percentage (ex. >90%) of the longest consecutive match region in the pseudogene sequence implies that pseudogene and EST sequences are very homologous, and that sequence mismatches are located in the sequence terminal region where there may be many sequencing errors. Finally for each sequence we obtained identity to known protein and EST sequence, overlap to EST sequence, length of the longest consecutive matches, and percentage of the longest consecutive match region in the pseudogene. We set several thresholds for these parameters, and pseudogenes satisfying the thresholds which were assumed to be expressed.

The number of pseudogenes predicted to be expressed greatly changes according to the thresholds set for each of these parameters. However, if we count the number of pseudogene sequences which align with EST sequences with few mismatches (i.e., Identity to EST sequence is 100% or percentage of the longest consecutive match region is very high), we can estimate that at least 2–3% and approx. 0.5–1% of pseudogenes are expressed in human and mouse, respectively (Table 1). These data suggested that human has much more expressed pseudogenes than mouse.
Table 1

Estimated number of expressed pseudogenes based on several criteria: thresholds for parameters to predict pseudogenes (columns 1–4) and number and percentage of pseudogenes which satisfy the given thresholds (columns 5–8)

Identity to known protein

Identity to EST sequence

Length of longest consecutive match region

Percentage of longest consecutive match region

Number of sequences satisfying the thresholds

Percentage of sequences satisfying the thresholds (%)

Human

Mouse

Human

Mouse

<0.9

1

217

51

2.8

1.1

<1

1

264

62

3.4

1.4

<1

>300

461

146

5.9

3.3

<1

>100

>0.9

178

33

2.3

0.7

<1

>0.9

178

33

2.3

0.7

<1

>0.95

141

27

1.8

0.6

<0.9

>0.97

775

194

9.9

4.3

Total

7868

4476

100.0

100.0

Expressed pseudogene stabilizes mRNA by it’s homologues region

We examined the mechanism underlying the ability of Makorin1-p1 to regulate the stability of Makorin1. A potential mechanism is suggested by these data (Fig. 4). Makorin1 is carrying destabilizing signal in the 5′ region of the mRNA. Makorin1-p1 is overlapping with this region and may protect mRNA of Makorin1 from degradation machinery by competing fashion [33].
Fig. 4

Schematic presentation of a proposed mechanism of mRNA stabilization by expressed pseudogenes. mRNA decay is controlled by cis-acting elements of each mRNA Enzymes and other auxiliary trans-acting factors recognize these regions to facilitate decay of mRNA. Expressed pseudogenes with homology to these cis-acting elements could compete with the binding of these factors to the original protein coding genes and indirectly stabilize mRNAs

Modulation of mRNA stability provides a powerful means for controlling gene expression during cell growth and differentiation as well as other physiological transitions [40, 41]. Cytoplasmic mRNA stability is regulated in at least two different ways. The first is the unique differential and selective decay rates characteristic of different mRNAs. A second level of regulation involves changes in the stability of a given mRNA in response to a wide variety of extracellular stimuli. To unravel the underlying processes of regulated mRNA turnover with precision a detailed analysis of the major components and mechanistic steps involved in mRNA turnover is required, for example, the cellular and extracellular stimuli or signals that trigger mRNA decay, cis-acting elements, enzymes, and other auxiliary trans-acting factors.

The 5′ cap structure is an important determinant of the stability of all messages. The 5′-5′ triphosphate linkage that characterizes this structure makes the body of the mRNA intrinsically resistant to degradation by 5′ to 3′ exonucleases. Translation initiation factor eIF4F, a multisubunit factor comprising eIF4G, eIF4A, and eIF4E, plays an important role in the decapping regulation. The eIF4E subunit crosslinks specifically to the cap structure and binds the cap-affinity columns. This factor is therefore expected to have a negative effect on the accessibility of the cap structure to the Dcp1p/Dcp2p complex [42, 43]. For several transcripts, including both stable and unstable mRNAs, decapping does not normally occur until the poly(A) tail is shortened to an oligo(A). Accordingly, loss of the poly(A) tail triggers subsequent mRNA degradation.

The de-polyadenylation of these mRNAs depends at least in part upon specific cis-acting elements found either in the coding region or, more frequently, in the 3′ UTR [44, 45, 46]. The 3′ UTR cis-acting destabilizing elements can be quite variable in sequence and length, but some are characterized by AU-rich regions (AUUUArepeats). Many proto-oncogenes, cytokines, and lymphokines including c-fos, c-myc, IL4, IL6, GM-CSF, TNF-α, and IL3 are carrying this cis-acting element. Instability elements can also be found in the coding region of mRNAs. For example, two destabilizing regions within the c-fos protein coding region, termed CRD-1 and CRD-2, have been identified, and CRD-1 is the major determinant (mCRD) [44, 45]. Specifically inhibiting the translation of a reporter mRNA bearing either the entire c-fos protein coding region or only the mCRD by insertion of a stable stem-loop upstream of the translation initiation codon led to full stabilization of the message [45, 46]. Protein complexes bind these regions and control subsequent deadenylation which triggers mRNA decay. Therefore, given that regions of expressed pseudogenes overlap with such regulatory region, they could protect mRNA from the degradation machinery.

Posttranscriptional gene regulation by nc-RNA

nc-RNA is defined as RNA molecules which function directly as structural, catalytic, or regulatory RNAs rather than expressing mRNAs that encode proteins [25, 26, 27]. Presence of nc-RNA had been recognized since the 1950s. Since most of cellular RNA were found in discrete particles in the cytoplasm [47], which were later shown to be the site of protein synthesis and called ribosomes [48]. Another class of functional RNA was predicted by Francis Crick’s “adaptor” hypothesis [48]. These RNAs later proved to be Crick’s adaptors: the transfer RNAs. Several abundant, small nc-RNAs other than rRNA and tRNA were detected and isolated biochemically, among them the uridine (U)-rich U RNAs [49, 50]. Many of these small RNAs are associated with proteins to form ribonucleoprotein (RNP) complexes [51]. Characterization of small RNPs was aided by the discovery that certain patients with autoimmune diseases, such as systemic lupus erythematosus, produce anti-RNP autoantibodies that could be used to immunoprecipitate small RNPs [52]. Many of the abundant small RNPs precipitated by these antisera, namely U1, U2, U4, U5, and U6 small nuclear RNA (snRNA), turned out to be components of the spliceosome, involved in splicing mRNAs [51, 53].

New nc-RNAs continue to appear; among the more fascinating stories is the discovery that RNAs have roles in chromatin structure [54]. A typical example is the human XIST (X-inactive-specific transcript) RNA which encodes a 17-kb nc-RNA with a key role in dose compensation and X-chromosome inactivation [55]. Drosophila melanogaster also seems to control dosage compensation using small chromatin-associated roX (RNA on the X) RNAs [56]. Interestingly, several large nc-RNAs have been clustered in the imprinted regions of vertebrate chromosomes, including the IPW (imprinted in Prader-Willi syndrome) and H19 (H19, imprinted maternally expressed untranslated mRNA) transcripts [57, 58]. The imprinted Prader-Willi crucial region seems to be especially rich in nc-RNAs [59, 60]. Many of these other RNAs are cis-antisense RNAs that overlap coding genes on the other genomic strand. These nc-RNAs are not limited within the imprinting region of mammals. Various cis-antisense RNAs have been observed in prokaryotes [61], plants [62] and animals [63], and their roles are unlikely to be limited to those in imprinting and chromatin structure.

Recently two classes of nc-RNAs which can play important regulatory roles in animals and plants by targeting mRNAs for cleavage or translational repression are introduced to explore gene function in vivo. The story of this new strategy for regulating gene expression in vivo had begun from the identification of lin-4 (lineage-abnormal-4) and let-7 (lethal-7) RNAs in C. elegans [64, 65]. The lin-4 and let-7 RNAs are unusual because they are expressed as 22-nt RNAs, having been processed from approx. 70-nt precursor hairpins. MicroRNAs (miRNAs) are endogenously encoded small nc-RNAs, derived by processing of short RNA hairpins, that can inhibit the translation of mRNAs bearing partially complementary target sequences. In contrast, small interfering RNAs (siRNAs), which are derived by processing of long double-stranded RNAs and are often of exogenous origin, degrade mRNAs bearing fully complementary sequences. It seems that miRNAs are more likely to function as translational repressors not as siRNAs in directing mRNA degradation. RNAi has been suggested to function as a primitive immune system against RNA viruses and retrotransposons [66, 67].

The discovery of RNA catalysis and the “RNA world” hypothesis for the origin of life provide a plausible explanation for why rRNA and tRNA are at the core of the translation machinery: Perhaps, they are the frozen evolutionary relic of the invention of the ribosome by an RNA-based “ribo-organism.” Other known nc-RNAs have also been proposed to be ancient relics of the last ribo-organisms. Many functional roles do not require the more sophisticated catalytic prowess of proteins and could be carried out by simple RNAs. nc-RNAs are often found to have roles that involve sequence specific recognition of another nucleic acid. Posttranscriptional regulation in particular can be achieved simply by steric occlusion of sites on a target pre-mRNA or mature RNA. Thus nc-RNA would be well adapted for regulatory roles as the mastermind in the “DNA-protein world.” Expressed pseudogenes might be playing an important role as an actor/actress in that stage.

Concluding remarks

We have uncovered a unique role of an expressed pseudogene as a regulator of mRNA stability. An active role of expressed pseudogenes has been observed in other organisms. For example, neuronal expression of neural nitric oxide synthase protein is suppressed by an antisense RNA transcribed from a nitric oxide synthase pseudogene [68]. Therefore posttranscriptional gene regulation by expressed pseudogenes might be a general regulatory phenomenon.

In the last few years extensive data have accumulated which show that in different cells various RNA transcripts are synthesized. Some of them lack protein coding capacity and seem to act at the RNA level. Initially, the biological role of those ncRNA was mysterious. Recent whole-genome sequencing efforts combined with transcription profiling and bioinformatics strategies have led to the notion that ncRNAs play important roles in the regulation of mRNA stability. Further analysis of such ncRNAs, including expressed pseudogenes, will likely lead to further understanding of the layers of complexity used in biological organisms to exquisitely regulate gene expression.

Acknowledgements

We thank Dr. Masami Muramatsu, Dr. Munehisa Ueno, Dr. Nobuhiro Deguchi, and Dr. Yoshihiko Funae for generous support. We also thank to Tomohito Itoh, Takumi Matsumoto, Shinji Sasaki, Michiyo Ishida, and Yuzuru Yamauchi for technical support.

Copyright information

© Springer-Verlag 2004