Automatic Orthologous-Protein-Clustering from Multiple Complete-Genomes by the Best Reciprocal BLAST Hits

  • Sunshin Kim
  • Kwang Su Jung
  • Keun Ho Ryu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3916)


Though the number of completely sequenced genomes quickly grows in recent years, the methods to predict protein functions by homology from the genomes have not been used sufficiently. It has been a successful technique to construct an OPCs(Orthologous Protein Clusters) with the best reciprocal BLAST hits from multiple complete-genomes. But it takes time-consuming-processes to make the OPCs with manual work. We, here, propose an automatic method that clusters OPs(Orthologous Proteins) from multiple complete-genomes, which is, to be extended, based on INPARANOID which is an automatic program to detect OPs between two complete-genomes. We also prove all possible clustering mathematically.


Protein Pair Orthologous Protein Automatic Program Common Protein Reciprocal Blast 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)CrossRefGoogle Scholar
  2. 2.
    Tatusov, R.L., Koonin, E.V., Lipman, D.J.: A genomic perspective on protein families. Science 278(5338), 631–637 (1997)CrossRefGoogle Scholar
  3. 3.
    Tatusov, R.L., Galperin, M.Y., Natale, D.A., Koonin, E.V., et al.: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Research 28, 33–36 (2000)CrossRefGoogle Scholar
  4. 4.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)CrossRefGoogle Scholar
  5. 5.
    Tatusov, R.L., Fedorova, N.D., Jackson, J.D., Aviva, R., Jacobs, A.R., et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 4, 41 (2003)CrossRefGoogle Scholar
  6. 6.
    Chervitz, S.A., Aravind, L., Sherlock, G., Ball, C.A., et al.: Comparison of the complete protein set of worm and yeast:orthology and divergence. Science 282, 2022–2028 (1998)CrossRefGoogle Scholar
  7. 7.
    Rubin, G.M., Yandell, M.D., Wortman, J.R., Gabor Miklos, G.L., et al.: Comparative genomics of the eukaryotes. Science 287, 2204–2215 (2000)CrossRefGoogle Scholar
  8. 8.
    Wheelan, S.J., Boguski, M.S., Duret, L., Makalowski, W.: Human and nematode orthologs – lessons from the analysis of 1800 human genes and the proteome of Caenorhabditis elegans. Gene 238, 163–170 (1999)CrossRefGoogle Scholar
  9. 9.
    Mushegian, A.R., Garey, J.R., Martin, J., Liu, L.X.: Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins enclosed by the human, fly, nematode, and yeast genomes. Genome. Res. 8, 590–598 (1998)CrossRefGoogle Scholar
  10. 10.
    Kanehisa, M., Peer, B.: Bioinformatics in the post-sequences era. nature genetics supplement 33, 305–310 (2003)CrossRefGoogle Scholar
  11. 11.
    Bork, P., Koonin, E.V.: Predicting functions from protein sequence-where are the bottlenecks? Nat. Genet. 18, 313–318 (1998)CrossRefGoogle Scholar
  12. 12.
    Eisen, J.A.: Phylogenomics:improving functional predictions for uncharacterized genes by evolutionary analysis. Genome. Res. 8, 163–167 (1998)CrossRefGoogle Scholar
  13. 13.
    Galperin, M.Y., Koonin, E.V.: Source of systematic error in functional annotation of genomes: domain rearrangement, nonorthologous gene displacement and operon disruption. In Silico Biol. 1, 55–67 (1998)Google Scholar
  14. 14.
    Kimmen, S.: Phylogenomic inference of protein molecular function: advances and challenges. Bioinformatics 20, 170–179 (2004)CrossRefGoogle Scholar
  15. 15.
    Bono, H., Goto, S., Fujibuchi, W., Ogata, H., et al.: Systematic Prediction of Orthologous Units of Genes in the Complete Genomes. In: Genome. Inform. Ser. Workshop Genome. Inform., vol. 9, pp. 32–40 (1998)Google Scholar
  16. 16.
    Remm, M., Storm, C.E., Sonnhammer, E.L.: Automatic Clustering of Orthologs and in-paralogs from Pairwise Species Comparisons. J. Mol. Biol. 314, 1041–1052 (2001)CrossRefGoogle Scholar
  17. 17.
    Montague, M.G., Hutchison III, C.A.: Gene content phylogeny of herpersviruses. PNAS, 5334–5339 (2000) Google Scholar
  18. 18.
    Stuart, J.M., Segal, E., Koller, D., Kim, S.K.: A Gene-Coexpression Network for Global Discovery of Conserved genetic Modules. Science 302, 249–255 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sunshin Kim
    • 1
  • Kwang Su Jung
    • 1
  • Keun Ho Ryu
    • 1
  1. 1.Database/Bioinformatics Laboratory, Department of Computer ScienceChungbuk National UniversityCheongjuSouth Korea

Personalised recommendations