Multi-genome Core Pathway Identification through Gene Clustering

  • Dimitrios M. Vitsios
  • Fotis E. Psomopoulos
  • Pericles A. Mitkas
  • Christos A. Ouzounis
Conference paper
Part of the IFIP Advances in Information and Communication Technology book series (IFIPAICT, volume 382)


In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.


bioinformatics metabolic pathways clustering algorithm phylogenetic analysis 


  1. 1.
    Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28(1), 27–30 (2000)CrossRefGoogle Scholar
  2. 2.
    Zhang, Y., et al.: Phylophenetic properties of metabolic pathway topologies as revealed by global analysis. BMC Bioinformatics 7(252), 1–13 (2006)Google Scholar
  3. 3.
    Schmid, S., Sunyaev, S., Bork, P., Dandekar, T.: Metabolites: a helping hand for pathway evolution? Trends in Biochemical Sciences 28(6), 336–341 (2003)CrossRefGoogle Scholar
  4. 4.
    Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. Journal of Computational Biology 6(3-4), 343–360 (1999)CrossRefGoogle Scholar
  5. 5.
    Forst, C.V., Schulten, K.: Phylogenetic analysis of metabolic pathways. Journal of Molecular Evolution 52(6), 471–489 (2001)Google Scholar
  6. 6.
    Heymans, M., Singh, A.K.: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics 19(suppl. 1), I138–I146 (2003)CrossRefGoogle Scholar
  7. 7.
    Liao, L., Kim, S., Tomb, J.F.: Genome Comparisons Based on Profiles of Metabolic Pathway. In: Sixth International Conference on Knowledge-Based Intelligent Information & Engineering Systems, Crema, Italy, September 16-18 (2002)Google Scholar
  8. 8.
    Hong, S.H., Kim, T.Y., Lee, S.Y.: Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Applied Microbiology and Biotechnology 65(2), 203–210 (2004)CrossRefGoogle Scholar
  9. 9.
    Aguila, D., Aviles, F.X., Querol, E., Sternberg, M.J.: Analysis of phenetic trees based on metabolic capabilites across the three domains of life. Journal of Molecular Biology 340(3), 491–512 (2004)CrossRefGoogle Scholar
  10. 10.
    Lin, F.P.Y., Coiera, E., Lan, R., Sintchenko, V.: In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles. BMC Bioinformatics 10, 86 (2009), doi:10.1186/1471-2105-10-86CrossRefGoogle Scholar
  11. 11.
    Gupta, G., Liu, A., Ghost, J.: Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets. IEEE Transactions on Computational Biology and Bioinformatics 7(2), 223–237 (2010)CrossRefGoogle Scholar
  12. 12.
    Psomopoulos, F., Mitkas, P.: Multi Level Clustering of Phylogenetic Profiles. In: Proceedings of the IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2010), pp. 308–309 (2010)Google Scholar
  13. 13.
    Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7), 1575–1584 (2002)CrossRefGoogle Scholar
  14. 14.
    Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  15. 15.
    van Dongen, S.: Graph Clustering by Flow Simulation, PhD thesis, University of Utrecht (May 2000),,
  16. 16.
    Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009),
  17. 17.
    Matlab Bioinformatics Toolbox,
  18. 18.
    Matlab MCR Runtime Engine,

Copyright information

© IFIP International Federation for Information Processing 2012

Authors and Affiliations

  • Dimitrios M. Vitsios
    • 1
  • Fotis E. Psomopoulos
    • 1
    • 2
  • Pericles A. Mitkas
    • 1
  • Christos A. Ouzounis
    • 2
  1. 1.Dept. of Electrical and Computer EngineeringAristotle University of ThessalonikiThessalonikiGreece
  2. 2.Institute of AgrobiotechnologyCenter for Research and Technology HellasThessalonikiGreece

Personalised recommendations