A Hybrid Grid and Its Application to Orthologous Groups Clustering

  • Tae-Kyung Kim
  • Kyung-Ran Kim
  • Sang-Keun Oh
  • Jong-Hak Lee
  • Wan-Sup Cho
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4216)


Orthologous groups are useful in the genome annotation, studies on gene evolution, and comparative genomics. However, the construction of orthologous groups is difficult to automate and takes so much time as the number of genome sequences increases. Furthermore, it is not easy to guarantee the accuracy of the automatically constructed orthologous groups. We propose an automatic orthologous group construction system for a large number of genomes. A hybrid grid computer system, consisting of 40 PCs, has been devised for fast construction of the orthologous groups from large number of genome sequences. The grid system constructs orthologous groups for 89 complete prokaryotes genomes just in a week (it takes 8 months on a single computer system). Furthermore, the system provides good extensibility for adopting new genomes in the existing orthologous groups. In the real experiment of the orthologous group constructions, more than 85% of the constructed orthologous groups coincide with those of KO (KEGG Ortholog) and COGs (Clusters of Orthologous Group of Proteins). Note that KO and COGs have been constructed manually or semi-automatically at the sacrifice of the extensibility for newly completed genomes.


Grid Computing Orthologous Group Master Node Orthologous Cluster Propose Cluster Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., et al.: Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990)Google Scholar
  2. 2.
    Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Research 25, 3389–3402 (1997)CrossRefGoogle Scholar
  3. 3.
    Fitch, W.M.: Distinguishing homologous from analogous proteins. Syst. Zool. 19, 99–113 (1970)CrossRefGoogle Scholar
  4. 4.
    Kanehisa, M., et al.: The KEGG resources for deciphering the genome. Nucleic Acids Res. 32, D277–D280 (2004)CrossRefGoogle Scholar
  5. 5.
    Kim, T.K., et al.: HGBS: A Hardware-Oriented Grid BLAST System. In: Proc. of the 5th IEEE/ACM Int’l. Symposium on Cluster Computing and the Grid, BioGrid 2005 (2005)Google Scholar
  6. 6.
    Kuo, Y.L., et al.: Construct a Grid Computing Environment for Bioinformatics. In: Proc. of the International Symposium on Parallel Architectures, Algorithms and Networks(ISPAN 2004), pp. 1087–4089 (2004)Google Scholar
  7. 7.
    Lee, S.J., et al.: Exploring protein fold space by secondary structure prediction using data distribution method on Grid platform. Bioinformatics (Advance Access published on July 29, 2004)Google Scholar
  8. 8.
    Remm, M., et al.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 1041–1052 (2001)CrossRefGoogle Scholar
  9. 9.
    Tatusov, R.L., et al.: The COG Database: A Tool for Genomic-Scale Analysis of Protein Function and Evolution. Nucleic Acids Res. 28, 33–36 (1999)CrossRefGoogle Scholar
  10. 10.
    Tatusov, R., et al.: A genomic perspective on protein families. Science 278, 631–637 (1997)CrossRefGoogle Scholar
  11. 11.
    Tatusov, R., et al.: The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28 (2001)CrossRefGoogle Scholar
  12. 12.
    Tatusov, R.L., et al.: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36 (2000)CrossRefGoogle Scholar
  13. 13.
    Tatusov, R.L., et al.: The COG database: an updated version includes eukaryotes. BMC Bioinformatics 11(4), 41 (2003)CrossRefGoogle Scholar
  14. 14.
    Wang, L., et al.: Biogrid Computing Platform: Parallel computing for protein alignment analysis. In: HPC Asia 2002, Bangalore, India (2002)Google Scholar
  15. 15.
    Yamanishi, Y., et al.: Extraction of Organism Groups from Whole Genome Comparisons. Genome Informatics 14, 438–439 (2003)Google Scholar
  16. 16.
    Yong-Meng, T.E.O., et al.: GLAD: a system for developing and deploying large-scale bioinformatics Grid. Bioinformatics (Advance Access published on September 23, 2004)Google Scholar
  17. 17.
    COGs official homepage, http://www.ncbi.nlm.nih.gov/COG/
  18. 18.
    KO official homepage, http://www.genome.jp/kegg/ko.html
  19. 19.
  20. 20.
  21. 21.

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Tae-Kyung Kim
    • 1
  • Kyung-Ran Kim
    • 2
  • Sang-Keun Oh
    • 2
  • Jong-Hak Lee
    • 3
  • Wan-Sup Cho
    • 2
  1. 1.Dept. of Information Industrial EngineeringChungbuk National UniversityCheongju, ChungbukKorea
  2. 2.Dept. of Management Information SystemsChungbuk National UniversityCheongju, ChungbukKorea
  3. 3.Division of CICECatholic University of DaeguGyeongsan, GyeongbukKorea

Personalised recommendations