, Volume 108, Issue 1, pp 9–17 | Cite as

Using the COG Database to Improve Gene Recognition in Complete Genomes

  • D. A. Natale
  • M. Y. Galperin
  • R. L. Tatusov
  • E. V. Koonin


A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.

essential genes microbial genomes phylogenetic patterns short proteins 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Andersson S.G. et al.: The genome sequence of Rickettsia prowazekiiand the origin of mitochondria, Nature 396(1986): 133–140.Google Scholar
  2. 2.
    Driessen A.J., Fekkes P and van der Wolk J.P.: The Sec system, Curr. Opin. Microbiol. 1(1998): 216–222.Google Scholar
  3. 3.
    Fitch W.M.: Distinguishing homologous from analogous proteins, Syst. Zool. 19(1970): 99–113.Google Scholar
  4. 4.
    Fitch W.M.: Uses for evolutionary trees, Philos. Trans. R. Soc. Lond. B Biol. Sci. 349(1995): 93–102.Google Scholar
  5. 5.
    Fleischmann R.D. et al.: Whole-genome random sequencing and assembly of Haemophilus influenzaeRd, Science 269(1995): 496–512.Google Scholar
  6. 6.
    Galperin M.Y. and Koonin E.V.: Functional genomics and enzyme evolution. Homologous and analogous enzymes encoded in microbial genomes, Genetica 106(1999): 159–170.Google Scholar
  7. 7.
    Galperin M.Y., Tatusov R.L. and Koonin E.V.: Comparing microbial genomes: how the gene set determines the lifestyle, In: Charlebois R.L. (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, D.C, 1999, pp. 91–108.Google Scholar
  8. 8.
    Galperin M.Y., Walker D.R. and Koonin E.V.: Analogous enzymes: independent inventions in enzyme evolution, Genome Res. 8(1998): 779–790.Google Scholar
  9. 9.
    Koonin E.V., Mushegian A.R. and Bork P.: Non-orthologous gene displacement, Trends Genet. 12(1996): 334–336.Google Scholar
  10. 10.
    Koonin E.V., Tatusov R.L. and Galperin M.Y.: Beyond the complete genomes: from sequences to structure and function, Curr. Opin. Struct. Biol. 8(1998): 355–363.Google Scholar
  11. 11.
    Kozak M.: Initiation of translation in prokaryotes and eukaryotes, Gene 234(1999): 187–208.Google Scholar
  12. 12.
    Tatusov R.L. et al.: The COG database: a tool for genomescale analysis of protein functions and evolution, Nucleic Acids Res. 28(2000): 33–36.Google Scholar
  13. 13.
    Tatusov R.L., Koonin E.V. and Lipman D.J.: A genomic perspective on protein families, Science 278(1997): 631–637.Google Scholar
  14. 14.
    Thompson J.D., Higgins D.G. and Gibson T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22(1994): 4673–4680.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • D. A. Natale
    • 1
  • M. Y. Galperin
    • 2
  • R. L. Tatusov
    • 1
  • E. V. Koonin
    • 1
  1. 1.National Center for Biotechnology InformationNational Library of Medicine, National Institutes of HealthBethesdaUSA
  2. 2.National Center for Biotechnology InformationNational Library of Medicine, National Institutes of HealthBethesdaUSA

Personalised recommendations