Abstract
Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sakharkar, K., Sakharkar, M., Chow, V.: Gene fusion in Helicobacter pylori: making the ends meet. Antonie van Leeuwenhoek 89, 169–180 (2006)
Sakharkar, M.K., Perumal, B.S., Sakharkar, K.R., Kangueane, P.: An analysis on gene architecture in human and mouse genomes. In. Silico. Biol. 5 (2005)
Cock, P., Whitworth, D.: Evolution of gene overlaps: Relative reading frame bias in prokaryotic two-component system genes. J. Mol. Evol. 64, 457–462 (2007)
Fukuda, Y., Washio, T., Tomita, M.: Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucl. Acids Res. 27, 1847–1853 (1999)
Krakauer, D.C.: Stability and evolution of overlapping genes. Evolution 54, 731–739 (2000)
Lillo, F., Krakauer, D.: A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol. Direct. 2, 22 (2007)
Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: Overlapping genes as rare genomic markers: the phylogeny of γ-Proteobacteria as a case study. Trends Genet. 22, 593–596 (2006)
Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 8, 266 (2007)
Sabath, N., Graur, D., Landan, G.: Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol. Direct. 3, 36 (2008)
Yooseph, S., Sutton, G., Rusch, D.B. (and coworkers): The Sorcerer II Global Ocean Sampling Expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007)
Zaaijer, H.L., van Hemert, F.J., Koppelman, M.H., Lukashov, V.V.: Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus. J. Gen. Virol. 88, 2137–2143 (2007)
Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J.Y., Gojobori, T.: Constrained evolution with respect to gene overlap of hepatitis B virus. J. Mol. Evol. 44(suppl. 1), 83–90 (1997)
Nekrutenko, A., Wadhawan, S., Goetting-Minesky, P., Makova, K.D.: Oscillating evolution of a mammalian locus with overlapping reading frames: an XLαs/ALEX relay. PLoS Genet. 1, 18 (2005)
Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, C.A., Hutchison, C.A., Slocombe, P.M., Smith, M.: Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687–695 (1977)
Guyader, S., Ducray, D.G.: Sequence analysis of Potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J. Gen. Virol. 83, 1799–1807 (2002)
Lamb, R.A., Horvath, C.M.: Diversity of coding strategies in influenza viruses. Trends Genet. 7, 261–266 (1991)
McGirr, K.M., Buehuring, G.C.: Tax and rex: overlapping genes of the Deltaretrovirus group. Virus Genes 32, 229–239 (2006)
Firth, A.E., Atkins, J.F.: Analysis of the coding potential of the partially overlapping 3’ ORF in segment 5 of the plant fijiviruses. Virol. J. 6, 32 (2009)
Pedroso, I., Rivera, G., Lazo, F., Chacon, M., Ossandon, F., Veloso, F.A., Holmes, D.S.: AlterORF: a database of alternate open reading frames. Nucleic Acids Res. 36, 517–518 (2008)
Kim, D.S., Cho, C.Y., Huh, J.W., Kim, H.S., Cho, H.G.: EVOG: a database for evolutionary analysis of overlapping genes. Nucleic Acids Res. 37, D698–D702 (2009)
Okamura, K., Feuk, L., Marques-Bonet, T., Navarro, A., Scherer, S.W.: Frequent appearance of novel protein-coding sequences by frameshift translation. Genomics 88, 690–697 (2006)
Majoros, W.H.: Methods for Computational Gene Prediction. Cambridge University Press, Cambridge (2007)
Di Gesù, V.: Data Analysis and Bioinformatics. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 373–388. Springer, Heidelberg (2007)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)
National Center for Biotechnology Information (NCBI). NCBI Homepage (2009), http://www.ncbi.nlm.nih.gov/
Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)
Linial, M.: How incorrect annotations evolve – the case of short ORFs. Trends Biotechnol. 21, 298–300 (2003)
National Center for Biotechnology Information (NCBI). The BLAST Databases (2009), ftp://ftp.ncbi.nih.gov/blast/db/
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(2), 403–410 (1990)
National Center for Biotechnology Information (NCBI). Documentation of the BLASTCLUST-algorithm, ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html
Sotiriades, E., Dollas, A.: A general reconfigurable architecture for the BLAST algorithm. J. VLSI Signal Process. 48, 189–208 (2007)
Sabath, N.: Molecular Evolution of Overlapping Genes. University of Houston (2009)
Belshaw, R., Pybus, O.G.G., Rambaut, A.: The evolution of genome compression and genomic novelty in RNA viruses. Genome Res. 17, 1496–1504 (2007)
Johnson, Z.I., Chisholm, S.W.: Properties of overlapping genes are conserved across microbial genomes. Genome Inform. 14, 2268–2272 (2004)
Tunca, S., Barreiro, C., Coque, J.J., Martin, J.F.: Two overlapping antiparallel genes encoding the iron regulator DmdR1 and the Adm proteins control siderophore and antibiotic biosynthesis in Streptomyces coelicolor A3(2). FEBS J. 276, 4814–4827 (2009)
Hatfull, G.F., Pedulla, M.L., Jacobs-Sera, D. (and coworkers): Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet. 2, e92 (2006)
Okeke, M.I., Adekoya, O.A., Moens, U., Tryland, M., Traavik, T., Nilssen, O.: Comparative sequence analysis of A-type inclusion (ATI) and P4c proteins of orthopoxviruses that produce typical and atypical ATI phenotypes. Virus Genes 3, 200–209 (2009)
Dautin, N., Bernstein, H.D.: Protein secretion in gram-negative bacteria via the autotransporter pathway. Annu. Rev. Microbiol. 61, 89–112 (2007)
Zhao, X., McGirr, K.M., Buehring, G.C.: Potential evolutionary influences on overlapping reading frames in the bovine leukemia virus pXBL region. Genomics 89, 502–511 (2007)
Palleja, A., Reverter, T., Garcia-Vallve, S., Romeu, A.: PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 10, 281 (2009)
Zhulin, I.B.: It is computation time for bacteriology. J. Bacteriol. 191, 20–22 (2009)
Wul, D., Hugenholtz, P., Mavromatis, K. (coworkers): A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462, 1056–1060 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Neuhaus, K., Oelke, D., Fürst, D., Scherer, S., Keim, D.A. (2010). Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_20
Download citation
DOI: https://doi.org/10.1007/978-3-642-12211-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12210-1
Online ISBN: 978-3-642-12211-8
eBook Packages: Computer ScienceComputer Science (R0)