Skip to main content
Log in

Detection of Functional Overlapping Genes: Simulation and Case Studies

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

As far as protein-coding genes are concerned, there is a non-zero probability that at least one of the five possible overlapping sequences of any gene will contain an open-reading frame (ORF) of a length that may be suitable for coding a functional protein. It is, however, very difficult to determine whether or not such an ORF is functional. Recently, we proposed a method that predicts functionality of an overlapping ORF if it can be shown that it has been subject to purifying selection during its evolution. Here, we use simulation to test this method under several conditions and compare it with the method of Firth and Brown. We found that under most conditions, our method detects functional overlapping genes with higher sensitivity than Firth and Brown’s method, while maintaining high specificity. Further, we tested the hypothesis that the two aminoacyl tRNA synthetase classes have originated from a pair of overlapping genes. A central piece of evidence ostensibly supporting this hypothesis is the assertion that an overlapping ORF of a heat-shock protein-70 gene, which exhibits some similarity to class 2 aminoacyl tRNA synthetases, is functional. We found signature of purifying selection only in highly divergent sequences, suggesting that the method yields false-positives in high sequence divergence and that the overlapping ORF is not a functional gene. Finally, we examined three cases of overlap in the human genome. We find varying signatures of purifying selection acting on these overlaps, raising the possibility that two of the overlapping genes may not be functional.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    CAS  PubMed  Google Scholar 

  • Bourne HR, Sanders DA, McCormick F (1990) The GTPase superfamily: a conserved switch for diverse cell functions. Nature 348:125–132

    Article  CAS  PubMed  Google Scholar 

  • Carter CW, Duax WL (2002) Did tRNA synthetase classes arise on opposite strands of the same gene? Mol Cell 10:705–708

    Article  CAS  PubMed  Google Scholar 

  • Chung WY, Wadhawan S, Szklarczyk R, Pond SK, Nekrutenko A (2007) A first look at ARFome: dual-coding genes in mammalian genomes. PLoS Comput Biol 3:e91

    Article  PubMed  Google Scholar 

  • Chung BY, Miller WA, Atkins JF, Firth AE (2008) An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA 105:5897–5902

    Article  CAS  PubMed  Google Scholar 

  • de Groot S, Mailund T, Hein J (2007) Comparative annotation of viral genomes with non-conserved gene structure. Bioinformatics 23:1080–1089

    Article  PubMed  Google Scholar 

  • de Groot S, Mailund T, Lunter G, Hein J (2008) Investigating selection on viruses: a statistical alignment approach. BMC Bioinform 9:304

    Article  Google Scholar 

  • Delcher AL, Harmon D, Kasif S, White O, Salzberg SL (1999) Improved microbial gene identification with GLIMMER. Nucl Acids Res 27:4636–4641

    Article  CAS  PubMed  Google Scholar 

  • Firth AE (2008) Bioinformatic analysis suggests that the Orbivirus VP6 cistron encodes an overlapping gene. Virol J 5:48

    Article  PubMed  Google Scholar 

  • Firth AE, Atkins JF (2008a) Bioinformatic analysis suggests that a conserved ORF in the waikaviruses encodes an overlapping gene. Arch Virol 153:1379–1383

    Article  CAS  PubMed  Google Scholar 

  • Firth AE, Atkins JF (2008b) Bioinformatic analysis suggests that the Cypovirus 1 major core protein cistron harbours an overlapping gene. Virol J 5:62

    Article  PubMed  Google Scholar 

  • Firth AE, Atkins JF (2009) Analysis of the coding potential of the partially overlapping 3′ ORF in segment 5 of the plant fijiviruses. Virol J 6:32

    Article  PubMed  Google Scholar 

  • Firth AE, Brown CM (2005) Detecting overlapping coding sequences with pairwise alignments. Bioinformatics 21:282–292

    Article  CAS  PubMed  Google Scholar 

  • Firth AE, Brown CM (2006) Detecting overlapping coding sequences in virus genomes. BMC Bioinform 7:75

    Article  Google Scholar 

  • Firth AE, Wang QS, Jan E, Atkins JF (2009) Bioinformatic evidence for a stem-loop structure 5′-adjacent to the IGR-IRES and for an overlapping gene in the bee paralysis dicistroviruses. Virol J 6:193

    Article  PubMed  Google Scholar 

  • Graur D, Li W-H (2000) Fundamentals of molecular evolution. Sinauer Associates, Sunderland, MA

    Google Scholar 

  • Hayward BE, Kamiya M, Strain L, Moran V, Campbell R, Hayashizaki Y, Bonthron DT (1998) The human GNAS1 gene is imprinted and encodes distinct paternally and biallelically expressed G proteins. Proc Natl Acad Sci USA 95:10038–10043

    Article  CAS  PubMed  Google Scholar 

  • Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, Kent WJ (2004) The UCSC Table Browser data retrieval tool. Nucl Acids Res 32:D493–D496

    Article  CAS  PubMed  Google Scholar 

  • Keese PK, Gibbs A (1992) Origins of genes: “big bang” or continuous creation? Proc Natl Acad Sci USA 89:9489–9493

    Article  CAS  PubMed  Google Scholar 

  • Kim SH, Mitchell M, Fujii H, Llanos S, Peters G (2003) Absence of p16INK4a and truncation of ARF tumor suppressors in chickens. Proc Natl Acad Sci USA 100:211–216

    Article  CAS  PubMed  Google Scholar 

  • Klemke M, Kehlenbach RH, Huttner WB (2001) Two overlapping reading frames in a single exon encode interacting proteins—a novel way of gene usage. EMBO J 20:3849–3860

    Article  CAS  PubMed  Google Scholar 

  • Konstantopoulou I, Ouzounis CA, Drosopoulou E, Yiangou M, Sideras P, Sander C, Scouras ZG (1995) A Drosophila hsp70 gene contains long, antiparallel, coupled open reading frames (LAC ORFs) conserved in homologous loci. J Mol Evol 41:414–420

    Article  CAS  PubMed  Google Scholar 

  • Kozasa T, Itoh H, Tsukamoto T, Kaziro Y (1988) Isolation and characterization of the human Gs alpha gene. Proc Natl Acad Sci USA 85:2081–2085

    Article  CAS  PubMed  Google Scholar 

  • Kumar S, Nei M, Dudley J, Tamura K (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform 9:299–306

    Article  CAS  PubMed  Google Scholar 

  • Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G (2004) In search of antisense. Trends Biochem Sci 29:88–94

    Article  CAS  PubMed  Google Scholar 

  • Levine MA, Modi WS, O’Brien SJ (1991) Mapping of the gene encoding the alpha subunit of the stimulatory G protein of adenylyl cyclase (GNAS1) to 20q13.2–q13.3 in human by in situ hybridization. Genomics 11:478–479

    Article  CAS  PubMed  Google Scholar 

  • Liang H, Landweber LF (2006) A genome-wide study of dual coding regions in human alternatively spliced genes. Genome Res 16:190–196

    Article  CAS  PubMed  Google Scholar 

  • McCauley S, de Groot S, Mailund T, Hein J (2007) Annotation of selection strengths in viral genomes. Bioinformatics 23:2978–2986

    Article  CAS  PubMed  Google Scholar 

  • Miyata T, Yasunaga T (1978) Evolution of overlapping genes. Nature 272:532–535

    Article  CAS  PubMed  Google Scholar 

  • Monnerjahn C, Techel D, Mohamed SA, Rensing L (2000) A non-stop antisense reading frame in the grp78 gene of Neurospora crassa is homologous to the Achlya klebsiana NAD-gdh gene but is not being transcribed. FEMS Microbiol Lett 183:307–312

    Article  CAS  PubMed  Google Scholar 

  • Nekrutenko A, He J (2006) Functionality of unspliced XBP1 is required to explain evolution of overlapping reading frames. Trends Genet 22:645–648

    Article  CAS  PubMed  Google Scholar 

  • Nekrutenko A, Makova KD, Li WH (2002) The K(A)/K(S) ratio test for assessing the protein-coding potential of genomic regions: an empirical and simulation study. Genome Res 12:198–202

    Article  CAS  PubMed  Google Scholar 

  • Nekrutenko A, Wadhawan S, Goetting-Minesky P, Makova KD (2005) Oscillating evolution of a mammalian locus with overlapping reading frames: an XLalphas/ALEX relay. PLoS Genet 1:e18

    Article  PubMed  Google Scholar 

  • Nielsen R, Yang Z (1998) Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics 148:929–936

    CAS  PubMed  Google Scholar 

  • Palleja A, Harrington ED, Bork P (2008) Large gene overlaps in prokaryotic genomes: result of functional constraints or mispredictions? BMC Genomics 9:335

    Article  PubMed  Google Scholar 

  • Quelle DE, Zindy F, Ashmun RA, Sherr CJ (1995) Alternative reading frames of the INK4a tumor suppressor gene encode two unrelated proteins capable of inducing cell cycle arrest. Cell 83:993–1000

    Article  CAS  PubMed  Google Scholar 

  • Ribrioux S, Brungger A, Baumgarten B, Seuwen K, John MR (2008) Bioinformatics prediction of overlapping frameshifted translation products in mammalian transcripts. BMC Genomics 9:122

    Article  PubMed  Google Scholar 

  • Rodin SN, Ohno S (1995) Two types of aminoacyl-tRNA synthetases could be originally encoded by complementary strands of the same nucleic acid. Orig Life Evol Biosph 25:565–589

    Article  CAS  PubMed  Google Scholar 

  • Rother KI, Clay OK, Bourquin JP, Silke J, Schaffner W (1997) Long non-stop reading frames on the antisense strand of heat shock protein 70 genes and prion protein (PrP) genes are conserved between species. Biol Chem 378:1521–1530

    Article  CAS  PubMed  Google Scholar 

  • Sabath N, Graur D, Landan G (2008a) Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol Direct 3:36

    PubMed  Google Scholar 

  • Sabath N, Landan G, Graur D (2008b) A method for the simultaneous estimation of selection intensities in overlapping genes. PLoS ONE 3:e3996

    Article  PubMed  Google Scholar 

  • Sabath N, Price N, Graur D (2009) A potentially novel overlapping gene in the genomes of Israeli acute paralysis virus and its relatives. Virol J 6:144

    Google Scholar 

  • Schneider A, Souvorov A, Sabath N, Landan G, Gonnet GH, Graur D (2009) Estimates of positive Darwinian selection are inflated by errors in sequencing, annotation, and alignment. Genome Biol Evol 1:114–118

    Google Scholar 

  • Silke J (1997) The majority of long non-stop reading frames on the antisense strand can be explained by biased codon usage. Gene 194:143–155

    Article  CAS  PubMed  Google Scholar 

  • Szklarczyk R, Heringa J, Pond SK, Nekrutenko A (2007) Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function. Proc Natl Acad Sci USA 104:12807–12812

    Article  CAS  PubMed  Google Scholar 

  • Thompson JD, Gibson TJ, Higgins DG (2002) Multiple sequence alignment using ClustalW and ClustalX. Curr Protoc Bioinform Chapter 2: Unit 2.3

  • Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK et al (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562

    Article  CAS  PubMed  Google Scholar 

  • Williams TA, Wolfe KH, Fares MA (2009) No rosetta stone for a sense-antisense origin of aminoacyl tRNA synthetase classes. Mol Biol Evol 26:445–450

    Article  CAS  PubMed  Google Scholar 

  • Xu H, Wang P, Fu Y, Zheng Y, Tang Q, Si L, You J, Zhang Z, Zhu Y, Zhou L, Wei Z, Lin B, Hu L, Kong X (2010) Length of the ORF, position of the first AUG and the Kozak motif are important factors in potential dual-coding transcripts. Cell Res 20:445–457

    Article  CAS  PubMed  Google Scholar 

  • Yoshida H, Matsui T, Yamamoto A, Okada T, Mori K (2001) XBP1 mRNA is induced by ATF6 and spliced by IRE1 in response to ER stress to produce a highly active transcription factor. Cell 107:881–891

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Nielsen R, Yang Z (2005) Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol 22:2472–2479

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We thank Dr. Anton Nekrutenko and an anonymous reviewer for their useful comments. This work was supported in part by US National Library of Medicine Grant LM010009-01 to Dan Graur and Giddy Landan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Niv Sabath.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Fig. 1

Detection of selection by the SLG (solid lines, P < 0.01) and FB (dashed lines) methods on simulated genes with different levels of GC contents (blue, green, and red, for 30, 50, and 70%, respectively). Each data point is the percentage of runs for which the methods detected selection. We used same-strand phase 1 overlaps, length of 300 codons, t = 0.4, ω k was set to 0.2, and ω h varied between 0.2 and 1 (JPEG 304 kb)

Supplementary Fig. 2

Amino acid alignment of GNAS1 genes from human and mouse. The overlap region with ARF is highlighted in red. multiple_alignments.txt: Multiple alignments used in the study in FASTA format (JPEG 120 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sabath, N., Graur, D. Detection of Functional Overlapping Genes: Simulation and Case Studies. J Mol Evol 71, 308–316 (2010). https://doi.org/10.1007/s00239-010-9386-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-010-9386-3

Keywords

Navigation