Towards Automatic Detecting of Overlapping Genes - Clustered BLAST Analysis of Viral Genomes

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6023)


Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of >500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (>99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.


overlapping genes clustering BLAST analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sakharkar, K., Sakharkar, M., Chow, V.: Gene fusion in Helicobacter pylori: making the ends meet. Antonie van Leeuwenhoek 89, 169–180 (2006)CrossRefGoogle Scholar
  2. 2.
    Sakharkar, M.K., Perumal, B.S., Sakharkar, K.R., Kangueane, P.: An analysis on gene architecture in human and mouse genomes. In. Silico. Biol. 5 (2005)Google Scholar
  3. 3.
    Cock, P., Whitworth, D.: Evolution of gene overlaps: Relative reading frame bias in prokaryotic two-component system genes. J. Mol. Evol. 64, 457–462 (2007)CrossRefGoogle Scholar
  4. 4.
    Fukuda, Y., Washio, T., Tomita, M.: Comparative study of overlapping genes in the genomes of Mycoplasma genitalium and Mycoplasma pneumoniae. Nucl. Acids Res. 27, 1847–1853 (1999)CrossRefGoogle Scholar
  5. 5.
    Krakauer, D.C.: Stability and evolution of overlapping genes. Evolution 54, 731–739 (2000)Google Scholar
  6. 6.
    Lillo, F., Krakauer, D.: A statistical analysis of the three-fold evolution of genomic compression through frame overlaps in prokaryotes. Biol. Direct. 2, 22 (2007)CrossRefGoogle Scholar
  7. 7.
    Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: Overlapping genes as rare genomic markers: the phylogeny of γ-Proteobacteria as a case study. Trends Genet. 22, 593–596 (2006)CrossRefGoogle Scholar
  8. 8.
    Luo, Y., Fu, C., Zhang, D.-Y., Lin, K.: BPhyOG: An interactive server for genome-wide inference of bacterial phylogenies based on overlapping genes. BMC Bioinformatics 8, 266 (2007)CrossRefGoogle Scholar
  9. 9.
    Sabath, N., Graur, D., Landan, G.: Same-strand overlapping genes in bacteria: compositional determinants of phase bias. Biol. Direct. 3, 36 (2008)Google Scholar
  10. 10.
    Yooseph, S., Sutton, G., Rusch, D.B. (and coworkers): The Sorcerer II Global Ocean Sampling Expedition: expanding the universe of protein families. PLoS Biol. 5, e16 (2007)Google Scholar
  11. 11.
    Zaaijer, H.L., van Hemert, F.J., Koppelman, M.H., Lukashov, V.V.: Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus. J. Gen. Virol. 88, 2137–2143 (2007)CrossRefGoogle Scholar
  12. 12.
    Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J.Y., Gojobori, T.: Constrained evolution with respect to gene overlap of hepatitis B virus. J. Mol. Evol. 44(suppl. 1), 83–90 (1997)CrossRefGoogle Scholar
  13. 13.
    Nekrutenko, A., Wadhawan, S., Goetting-Minesky, P., Makova, K.D.: Oscillating evolution of a mammalian locus with overlapping reading frames: an XLαs/ALEX relay. PLoS Genet. 1, 18 (2005)CrossRefGoogle Scholar
  14. 14.
    Sanger, F., Air, G.M., Barrell, B.G., Brown, N.L., Coulson, A.R., Fiddes, C.A., Hutchison, C.A., Slocombe, P.M., Smith, M.: Nucleotide sequence of bacteriophage φX174 DNA. Nature 265, 687–695 (1977)CrossRefGoogle Scholar
  15. 15.
    Guyader, S., Ducray, D.G.: Sequence analysis of Potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J. Gen. Virol. 83, 1799–1807 (2002)Google Scholar
  16. 16.
    Lamb, R.A., Horvath, C.M.: Diversity of coding strategies in influenza viruses. Trends Genet. 7, 261–266 (1991)Google Scholar
  17. 17.
    McGirr, K.M., Buehuring, G.C.: Tax and rex: overlapping genes of the Deltaretrovirus group. Virus Genes 32, 229–239 (2006)CrossRefGoogle Scholar
  18. 18.
    Firth, A.E., Atkins, J.F.: Analysis of the coding potential of the partially overlapping 3’ ORF in segment 5 of the plant fijiviruses. Virol. J. 6, 32 (2009)CrossRefGoogle Scholar
  19. 19.
    Pedroso, I., Rivera, G., Lazo, F., Chacon, M., Ossandon, F., Veloso, F.A., Holmes, D.S.: AlterORF: a database of alternate open reading frames. Nucleic Acids Res. 36, 517–518 (2008)CrossRefGoogle Scholar
  20. 20.
    Kim, D.S., Cho, C.Y., Huh, J.W., Kim, H.S., Cho, H.G.: EVOG: a database for evolutionary analysis of overlapping genes. Nucleic Acids Res. 37, D698–D702 (2009)Google Scholar
  21. 21.
    Okamura, K., Feuk, L., Marques-Bonet, T., Navarro, A., Scherer, S.W.: Frequent appearance of novel protein-coding sequences by frameshift translation. Genomics 88, 690–697 (2006)CrossRefGoogle Scholar
  22. 22.
    Majoros, W.H.: Methods for Computational Gene Prediction. Cambridge University Press, Cambridge (2007)Google Scholar
  23. 23.
    Di Gesù, V.: Data Analysis and Bioinformatics. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 373–388. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31, 264–323 (1999)CrossRefGoogle Scholar
  25. 25.
    National Center for Biotechnology Information (NCBI). NCBI Homepage (2009),
  26. 26.
    Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 16, 276–277 (2000)CrossRefGoogle Scholar
  27. 27.
    Linial, M.: How incorrect annotations evolve – the case of short ORFs. Trends Biotechnol. 21, 298–300 (2003)CrossRefGoogle Scholar
  28. 28.
    National Center for Biotechnology Information (NCBI). The BLAST Databases (2009),
  29. 29.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic Local Alignment Search Tool. J. Mol. Biol. 215(2), 403–410 (1990)Google Scholar
  30. 30.
    National Center for Biotechnology Information (NCBI). Documentation of the BLASTCLUST-algorithm,
  31. 31.
    Sotiriades, E., Dollas, A.: A general reconfigurable architecture for the BLAST algorithm. J. VLSI Signal Process. 48, 189–208 (2007)CrossRefGoogle Scholar
  32. 32.
    Sabath, N.: Molecular Evolution of Overlapping Genes. University of Houston (2009)Google Scholar
  33. 33.
    Belshaw, R., Pybus, O.G.G., Rambaut, A.: The evolution of genome compression and genomic novelty in RNA viruses. Genome Res. 17, 1496–1504 (2007)CrossRefGoogle Scholar
  34. 34.
    Johnson, Z.I., Chisholm, S.W.: Properties of overlapping genes are conserved across microbial genomes. Genome Inform. 14, 2268–2272 (2004)CrossRefGoogle Scholar
  35. 35.
    Tunca, S., Barreiro, C., Coque, J.J., Martin, J.F.: Two overlapping antiparallel genes encoding the iron regulator DmdR1 and the Adm proteins control siderophore and antibiotic biosynthesis in Streptomyces coelicolor A3(2). FEBS J. 276, 4814–4827 (2009)CrossRefGoogle Scholar
  36. 36.
    Hatfull, G.F., Pedulla, M.L., Jacobs-Sera, D. (and coworkers): Exploring the mycobacteriophage metaproteome: phage genomics as an educational platform. PLoS Genet. 2, e92 (2006)Google Scholar
  37. 37.
    Okeke, M.I., Adekoya, O.A., Moens, U., Tryland, M., Traavik, T., Nilssen, O.: Comparative sequence analysis of A-type inclusion (ATI) and P4c proteins of orthopoxviruses that produce typical and atypical ATI phenotypes. Virus Genes 3, 200–209 (2009)CrossRefGoogle Scholar
  38. 38.
    Dautin, N., Bernstein, H.D.: Protein secretion in gram-negative bacteria via the autotransporter pathway. Annu. Rev. Microbiol. 61, 89–112 (2007)CrossRefGoogle Scholar
  39. 39.
    Zhao, X., McGirr, K.M., Buehring, G.C.: Potential evolutionary influences on overlapping reading frames in the bovine leukemia virus pXBL region. Genomics 89, 502–511 (2007)CrossRefGoogle Scholar
  40. 40.
    Palleja, A., Reverter, T., Garcia-Vallve, S., Romeu, A.: PairWise Neighbours database: overlaps and spacers among prokaryote genomes. BMC Genomics 10, 281 (2009)CrossRefGoogle Scholar
  41. 41.
    Zhulin, I.B.: It is computation time for bacteriology. J. Bacteriol. 191, 20–22 (2009)CrossRefGoogle Scholar
  42. 42.
    Wul, D., Hugenholtz, P., Mavromatis, K. (coworkers): A phylogeny-driven genomic encyclopaedia of bacteria and archaea. Nature 462, 1056–1060 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Chair of Microbial EcologyTechnische Universität MünchenFreisingGermany
  2. 2.Chair of Data Analysis and VisualizationUniversität KonstanzKonstanzGermany
  3. 3.Chair of Data Management and Data Exploration, Rheinisch-WestfälischeTechnische Hochschule AachenAachenGermany

Personalised recommendations