Abstract
As far as protein-coding genes are concerned, there is a non-zero probability that at least one of the five possible overlapping sequences of any gene will contain an open-reading frame (ORF) of a length that may be suitable for coding a functional protein. It is, however, very difficult to determine whether or not such an ORF is functional. Recently, we proposed a method that predicts functionality of an overlapping ORF if it can be shown that it has been subject to purifying selection during its evolution. Here, we use simulation to test this method under several conditions and compare it with the method of Firth and Brown. We found that under most conditions, our method detects functional overlapping genes with higher sensitivity than Firth and Brown’s method, while maintaining high specificity. Further, we tested the hypothesis that the two aminoacyl tRNA synthetase classes have originated from a pair of overlapping genes. A central piece of evidence ostensibly supporting this hypothesis is the assertion that an overlapping ORF of a heat-shock protein-70 gene, which exhibits some similarity to class 2 aminoacyl tRNA synthetases, is functional. We found signature of purifying selection only in highly divergent sequences, suggesting that the method yields false-positives in high sequence divergence and that the overlapping ORF is not a functional gene. Finally, we examined three cases of overlap in the human genome. We find varying signatures of purifying selection acting on these overlaps, raising the possibility that two of the overlapping genes may not be functional.
Similar content being viewed by others
References
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410
Bourne HR, Sanders DA, McCormick F (1990) The GTPase superfamily: a conserved switch for diverse cell functions. Nature 348:125–132
Carter CW, Duax WL (2002) Did tRNA synthetase classes arise on opposite strands of the same gene? Mol Cell 10:705–708
Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A (2007) A first look at ARFome: dual-coding genes in mammalian genomes. PLoS Comput Biol 3:e91
Chung BY, Miller WA, Atkins JF, Firth AE (2008) An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 105:5897–5902
de Groot S, Mailund T, Hein J (2007) Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 23:1080–1089
de Groot S, Mailund T, Lunter G, Hein J (2008) Investigating selection on viruses: a statistical alignment approach. BMC Bioinform 9:304
Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucl Acids Res 27:4636–4641
Firth AE (2008) Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 5:48
Firth AE, Atkins JF (2008a) Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 153:1379–1383
Firth AE, Atkins JF (2008b) Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 5:62
Firth AE, Atkins JF (2009) Analysis of the coding potential of the partially overlapping 3′ ORF in segment 5 of the plant fijiviruses. Virol J 6:32
Firth AE, Brown CM (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 21:282–292
Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinform 7:75
Firth AE, Wang QS, Jan E, Atkins JF (2009) Bioinformatic evidence for a stem-loop structure 5′-adjacent to the IGR-IRES and for an overlapping gene in the bee paralysis dicistroviruses. Virol J 6:193
Graur D, Li W-H (2000) Fundamentals of molecular evolution. Sinauer Associates, Sunderland, MA
Hayward BE, Kamiya M, Strain L, Moran V, Campbell R, Hayashizaki Y, Bonthron DT (1998) The human GNAS1 gene is imprinted and encodes distinct paternally and biallelically expressed G proteins. Proc Natl Acad Sci USA 95:10038–10043
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucl Acids Res 32:D493–D496
Keese PK, Gibbs A (1992) Origins of genes: “big bang” or continuous creation? Proc Natl Acad Sci USA 89:9489–9493
Kim SH, Mitchell M, Fujii H, Llanos S, Peters G (2003) Absence of p16INK4a and truncation of ARF tumor suppressors in chickens. Proc Natl Acad Sci USA 100:211–216
Klemke M, Kehlenbach RH, Huttner WB (2001) Two overlapping reading frames in a single exon encode interacting proteins—a novel way of gene usage. EMBO J 20:3849–3860
Konstantopoulou I, Ouzounis CA, Drosopoulou E, Yiangou M, Sideras P, Sander C, Scouras ZG (1995) A Drosophila hsp70 gene contains long, antiparallel, coupled open reading frames (LAC ORFs) conserved in homologous loci. J Mol Evol 41:414–420
Kozasa T, Itoh H, Tsukamoto T, Kaziro Y (1988) Isolation and characterization of the human Gs alpha gene. Proc Natl Acad Sci USA 85:2081–2085
Kumar S, Nei M, Dudley J, Tamura K (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 9:299–306
Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G (2004) In search of antisense. Trends Biochem Sci 29:88–94
Levine MA, Modi WS, O’Brien SJ (1991) Mapping of the gene encoding the alpha subunit of the stimulatory G protein of adenylyl cyclase (GNAS1) to 20q13.2–q13.3 in human by in situ hybridization. Genomics 11:478–479
Liang H, Landweber LF (2006) A genome-wide study of dual coding regions in human alternatively spliced genes. Genome Res 16:190–196
McCauley S, de Groot S, Mailund T, Hein J (2007) Annotation of selection strengths in viral genomes. Bioinformatics 23:2978–2986
Miyata T, Yasunaga T (1978) Evolution of overlapping genes. Nature 272:532–535
Monnerjahn C, Techel D, Mohamed SA, Rensing L (2000) A non-stop antisense reading frame in the grp78 gene of Neurospora crassa is homologous to the Achlya klebsiana NAD-gdh gene but is not being transcribed. FEMS Microbiol Lett 183:307–312
Nekrutenko A, He J (2006) Functionality of unspliced XBP1 is required to explain evolution of overlapping reading frames. Trends Genet 22:645–648
Nekrutenko A, Makova KD, Li WH (2002) The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res 12:198–202
Nekrutenko A, Wadhawan S, Goetting-Minesky P, Makova KD (2005) Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay. PLoS Genet 1:e18
Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936
Palleja A, Harrington ED, Bork P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 9:335
Quelle DE, Zindy F, Ashmun RA, Sherr CJ (1995) Alternative reading frames of the INK4a tumor suppressor gene encode two unrelated proteins capable of inducing cell cycle arrest. Cell 83:993–1000
Ribrioux S, Brungger A, Baumgarten B, Seuwen K, John MR (2008) Bioinformatics prediction of overlapping frameshifted translation products in mammalian transcripts. BMC Genomics 9:122
Rodin SN, Ohno S (1995) Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig Life Evol Biosph 25:565–589
Rother KI, Clay OK, Bourquin JP, Silke J, Schaffner W (1997) Long non-stop reading frames on the antisense strand of heat shock protein 70 genes and prion protein (PrP) genes are conserved between species. Biol Chem 378:1521–1530
Sabath N, Graur D, Landan G (2008a) Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol Direct 3:36
Sabath N, Landan G, Graur D (2008b) A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS ONE 3:e3996
Sabath N, Price N, Graur D (2009) A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives. Virol J 6:144
Schneider A, Souvorov A, Sabath N, Landan G, Gonnet GH, Graur D (2009) Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment. Genome Biol Evol 1:114–118
Silke J (1997) The majority of long non-stop reading frames on the antisense strand can be explained by biased codon usage. Gene 194:143–155
Szklarczyk R, Heringa J, Pond SK, Nekrutenko A (2007) Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function. Proc Natl Acad Sci USA 104:12807–12812
Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinform Chapter 2: Unit 2.3
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562
Williams TA, Wolfe KH, Fares MA (2009) No rosetta stone for a sense-antisense origin of aminoacyl tRNA synthetase classes. Mol Biol Evol 26:445–450
Xu H, Wang P, Fu Y, Zheng Y, Tang Q, Si L, You J, Zhang Z, Zhu Y, Zhou L, Wei Z, Lin B, Hu L, Kong X (2010) Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts. Cell Res 20:445–457
Yoshida H, Matsui T, Yamamoto A, Okada T, Mori K (2001) XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107:881–891
Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22:2472–2479
Acknowledgments
We thank Dr. Anton Nekrutenko and an anonymous reviewer for their useful comments. This work was supported in part by US National Library of Medicine Grant LM010009-01 to Dan Graur and Giddy Landan.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary Fig. 1
Detection of selection by the SLG (solid lines, P < 0.01) and FB (dashed lines) methods on simulated genes with different levels of GC contents (blue, green, and red, for 30, 50, and 70%, respectively). Each data point is the percentage of runs for which the methods detected selection. We used same-strand phase 1 overlaps, length of 300 codons, t = 0.4, ω k was set to 0.2, and ω h varied between 0.2 and 1 (JPEG 304 kb)
Supplementary Fig. 2
Amino acid alignment of GNAS1 genes from human and mouse. The overlap region with ARF is highlighted in red. multiple_alignments.txt: Multiple alignments used in the study in FASTA format (JPEG 120 kb)
Rights and permissions
About this article
Cite this article
Sabath, N., Graur, D. Detection of Functional Overlapping Genes: Simulation and Case Studies. J Mol Evol 71, 308–316 (2010). https://doi.org/10.1007/s00239-010-9386-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-010-9386-3