Abstract
A complete understanding of the biology of an organism necessarily starts with knowledge of its genetic makeup. Proteins encoded in a genome must be identified and characterized, and the presence or absence of specific sets of proteins must be noted in order to determine the possible biochemical pathways or functional systems utilized by that organism. The COG database presents a set of tools suited to these purposes, including the ability to select protein families (COGs) that contain proteins from a specified set of species. The selection is based upon a phylogenetic pattern, which is a shorthand representation of the presence or absence of a particular species in a COG. Here we present the use of phylogenetic patterns as a means to perform targeted searches for undetected protein-coding genes in complete genomes.
Similar content being viewed by others
References
Andersson S.G. et al.: The genome sequence of Rickettsia prowazekiiand the origin of mitochondria, Nature 396(1986): 133–140.
Driessen A.J., Fekkes P and van der Wolk J.P.: The Sec system, Curr. Opin. Microbiol. 1(1998): 216–222.
Fitch W.M.: Distinguishing homologous from analogous proteins, Syst. Zool. 19(1970): 99–113.
Fitch W.M.: Uses for evolutionary trees, Philos. Trans. R. Soc. Lond. B Biol. Sci. 349(1995): 93–102.
Fleischmann R.D. et al.: Whole-genome random sequencing and assembly of Haemophilus influenzaeRd, Science 269(1995): 496–512.
Galperin M.Y. and Koonin E.V.: Functional genomics and enzyme evolution. Homologous and analogous enzymes encoded in microbial genomes, Genetica 106(1999): 159–170.
Galperin M.Y., Tatusov R.L. and Koonin E.V.: Comparing microbial genomes: how the gene set determines the lifestyle, In: Charlebois R.L. (ed), Organization of the Prokaryotic Genome. ASM Press, Washington, D.C, 1999, pp. 91–108.
Galperin M.Y., Walker D.R. and Koonin E.V.: Analogous enzymes: independent inventions in enzyme evolution, Genome Res. 8(1998): 779–790.
Koonin E.V., Mushegian A.R. and Bork P.: Non-orthologous gene displacement, Trends Genet. 12(1996): 334–336.
Koonin E.V., Tatusov R.L. and Galperin M.Y.: Beyond the complete genomes: from sequences to structure and function, Curr. Opin. Struct. Biol. 8(1998): 355–363.
Kozak M.: Initiation of translation in prokaryotes and eukaryotes, Gene 234(1999): 187–208.
Tatusov R.L. et al.: The COG database: a tool for genomescale analysis of protein functions and evolution, Nucleic Acids Res. 28(2000): 33–36.
Tatusov R.L., Koonin E.V. and Lipman D.J.: A genomic perspective on protein families, Science 278(1997): 631–637.
Thompson J.D., Higgins D.G. and Gibson T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice, Nucleic Acids Res. 22(1994): 4673–4680.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Natale, D.A., Galperin, M.Y., Tatusov, R.L. et al. Using the COG Database to Improve Gene Recognition in Complete Genomes. Genetica 108, 9–17 (2000). https://doi.org/10.1023/A:1004031323748
Issue Date:
DOI: https://doi.org/10.1023/A:1004031323748