Skip to main content

Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2010)

Abstract

Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sakharkar, K., Sakharkar, M., Chow, V.: Gene fusion in Helicobacter pylori: making the ends meet. Antonie van Leeuwenhoek 89, 169–180 (2006)

    Article  Google Scholar 

  2. Sakharkar, M.K., Perumal, B.S., Sakharkar, K.R., Kangueane, P.: An analysis on gene architecture in human and mouse genomes. In. Silico. Biol. 5 (2005)

    Google Scholar 

  3. Cock, P., Whitworth, D.: Evolution of gene overlaps: Relative reading frame bias in prokaryotic two-component system genes. J. Mol. Evol. 64, 457–462 (2007)

    Article  Google Scholar 

  4. Fukuda, Y., Washio, T., Tomita, M.: Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucl. Acids Res. 27, 1847–1853 (1999)

    Article  Google Scholar 

  5. Krakauer, D.C.: Stability and evolution of overlapping genes. Evolution 54, 731–739 (2000)

    Google Scholar 

  6. Lillo, F., Krakauer, D.: A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol. Direct. 2, 22 (2007)

    Article  Google Scholar 

  7. Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: Overlapping genes as rare genomic markers: the phylogeny of γ-Proteobacteria as a case study. Trends Genet. 22, 593–596 (2006)

    Article  Google Scholar 

  8. Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 8, 266 (2007)

    Article  Google Scholar 

  9. Sabath, N., Graur, D., Landan, G.: Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol. Direct. 3, 36 (2008)

    Google Scholar 

  10. Yooseph, S., Sutton, G., Rusch, D.B. (and coworkers): The Sorcerer II Global Ocean Sampling Expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007)

    Google Scholar 

  11. Zaaijer, H.L., van Hemert, F.J., Koppelman, M.H., Lukashov, V.V.: Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus. J. Gen. Virol. 88, 2137–2143 (2007)

    Article  Google Scholar 

  12. Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J.Y., Gojobori, T.: Constrained evolution with respect to gene overlap of hepatitis B virus. J. Mol. Evol. 44(suppl. 1), 83–90 (1997)

    Article  Google Scholar 

  13. Nekrutenko, A., Wadhawan, S., Goetting-Minesky, P., Makova, K.D.: Oscillating evolution of a mammalian locus with overlapping reading frames: an XLαs/ALEX relay. PLoS Genet. 1, 18 (2005)

    Article  Google Scholar 

  14. Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, C.A., Hutchison, C.A., Slocombe, P.M., Smith, M.: Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687–695 (1977)

    Article  Google Scholar 

  15. Guyader, S., Ducray, D.G.: Sequence analysis of Potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J. Gen. Virol. 83, 1799–1807 (2002)

    Google Scholar 

  16. Lamb, R.A., Horvath, C.M.: Diversity of coding strategies in influenza viruses. Trends Genet. 7, 261–266 (1991)

    Google Scholar 

  17. McGirr, K.M., Buehuring, G.C.: Tax and rex: overlapping genes of the Deltaretrovirus group. Virus Genes 32, 229–239 (2006)

    Article  Google Scholar 

  18. Firth, A.E., Atkins, J.F.: Analysis of the coding potential of the partially overlapping 3’ ORF in segment 5 of the plant fijiviruses. Virol. J. 6, 32 (2009)

    Article  Google Scholar 

  19. Pedroso, I., Rivera, G., Lazo, F., Chacon, M., Ossandon, F., Veloso, F.A., Holmes, D.S.: AlterORF: a database of alternate open reading frames. Nucleic Acids Res. 36, 517–518 (2008)

    Article  Google Scholar 

  20. Kim, D.S., Cho, C.Y., Huh, J.W., Kim, H.S., Cho, H.G.: EVOG: a database for evolutionary analysis of overlapping genes. Nucleic Acids Res. 37, D698–D702 (2009)

    Google Scholar 

  21. Okamura, K., Feuk, L., Marques-Bonet, T., Navarro, A., Scherer, S.W.: Frequent appearance of novel protein-coding sequences by frameshift translation. Genomics 88, 690–697 (2006)

    Article  Google Scholar 

  22. Majoros, W.H.: Methods for Computational Gene Prediction. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  23. Di Gesù, V.: Data Analysis and Bioinformatics. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 373–388. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  24. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)

    Article  Google Scholar 

  25. National Center for Biotechnology Information (NCBI). NCBI Homepage (2009), http://www.ncbi.nlm.nih.gov/

  26. Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)

    Article  Google Scholar 

  27. Linial, M.: How incorrect annotations evolve – the case of short ORFs. Trends Biotechnol. 21, 298–300 (2003)

    Article  Google Scholar 

  28. National Center for Biotechnology Information (NCBI). The BLAST Databases (2009), ftp://ftp.ncbi.nih.gov/blast/db/

  29. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(2), 403–410 (1990)

    Google Scholar 

  30. National Center for Biotechnology Information (NCBI). Documentation of the BLASTCLUST-algorithm, ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html

  31. Sotiriades, E., Dollas, A.: A general reconfigurable architecture for the BLAST algorithm. J. VLSI Signal Process. 48, 189–208 (2007)

    Article  Google Scholar 

  32. Sabath, N.: Molecular Evolution of Overlapping Genes. University of Houston (2009)

    Google Scholar 

  33. Belshaw, R., Pybus, O.G.G., Rambaut, A.: The evolution of genome compression and genomic novelty in RNA viruses. Genome Res. 17, 1496–1504 (2007)

    Article  Google Scholar 

  34. Johnson, Z.I., Chisholm, S.W.: Properties of overlapping genes are conserved across microbial genomes. Genome Inform. 14, 2268–2272 (2004)

    Article  Google Scholar 

  35. Tunca, S., Barreiro, C., Coque, J.J., Martin, J.F.: Two overlapping antiparallel genes encoding the iron regulator DmdR1 and the Adm proteins control siderophore and antibiotic biosynthesis in Streptomyces coelicolor A3(2). FEBS J. 276, 4814–4827 (2009)

    Article  Google Scholar 

  36. Hatfull, G.F., Pedulla, M.L., Jacobs-Sera, D. (and coworkers): Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet. 2, e92 (2006)

    Google Scholar 

  37. Okeke, M.I., Adekoya, O.A., Moens, U., Tryland, M., Traavik, T., Nilssen, O.: Comparative sequence analysis of A-type inclusion (ATI) and P4c proteins of orthopoxviruses that produce typical and atypical ATI phenotypes. Virus Genes 3, 200–209 (2009)

    Article  Google Scholar 

  38. Dautin, N., Bernstein, H.D.: Protein secretion in gram-negative bacteria via the autotransporter pathway. Annu. Rev. Microbiol. 61, 89–112 (2007)

    Article  Google Scholar 

  39. Zhao, X., McGirr, K.M., Buehring, G.C.: Potential evolutionary influences on overlapping reading frames in the bovine leukemia virus pXBL region. Genomics 89, 502–511 (2007)

    Article  Google Scholar 

  40. Palleja, A., Reverter, T., Garcia-Vallve, S., Romeu, A.: PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 10, 281 (2009)

    Article  Google Scholar 

  41. Zhulin, I.B.: It is computation time for bacteriology. J. Bacteriol. 191, 20–22 (2009)

    Article  Google Scholar 

  42. Wul, D., Hugenholtz, P., Mavromatis, K. (coworkers): A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462, 1056–1060 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neuhaus, K., Oelke, D., Fürst, D., Scherer, S., Keim, D.A. (2010). Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2010. Lecture Notes in Computer Science, vol 6023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12211-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12211-8_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12210-1

  • Online ISBN: 978-3-642-12211-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics