Abstract
In the wake of gene-oriented data analysis in large-scale bioinformatics studies, focus in research is currently shifting towards the analysis of the functional association of genes, namely the metabolic pathways in which genes participate. The goal of this paper is to attempt to identify the core genes in a specific pathway, based on a user-defined selection of genomes. To this end, a novel methodology has been developed that uses data from the KEGG database, and through the application of the MCL clustering algorithm, identifies clusters that correspond to different “layers” of genes, either on a phylogenetic or a functional level. The algorithm’s complexity, evaluated experimentally, is presented and the results on a characteristic case study are discussed.
Chapter PDF
Similar content being viewed by others
References
Kanehisa, M., Goto, S.: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Research 28(1), 27–30 (2000)
Zhang, Y., et al.: Phylophenetic properties of metabolic pathway topologies as revealed by global analysis. BMC Bioinformatics 7(252), 1–13 (2006)
Schmid, S., Sunyaev, S., Bork, P., Dandekar, T.: Metabolites: a helping hand for pathway evolution? Trends in Biochemical Sciences 28(6), 336–341 (2003)
Forst, C.V., Schulten, K.: Evolution of metabolisms: a new method for the comparison of metabolic pathways using genomics information. Journal of Computational Biology 6(3-4), 343–360 (1999)
Forst, C.V., Schulten, K.: Phylogenetic analysis of metabolic pathways. Journal of Molecular Evolution 52(6), 471–489 (2001)
Heymans, M., Singh, A.K.: Deriving phylogenetic trees from the similarity analysis of metabolic pathways. Bioinformatics 19(suppl. 1), I138–I146 (2003)
Liao, L., Kim, S., Tomb, J.F.: Genome Comparisons Based on Profiles of Metabolic Pathway. In: Sixth International Conference on Knowledge-Based Intelligent Information & Engineering Systems, Crema, Italy, September 16-18 (2002)
Hong, S.H., Kim, T.Y., Lee, S.Y.: Phylogenetic analysis based on genome-scale metabolic pathway reaction content. Applied Microbiology and Biotechnology 65(2), 203–210 (2004)
Aguila, D., Aviles, F.X., Querol, E., Sternberg, M.J.: Analysis of phenetic trees based on metabolic capabilites across the three domains of life. Journal of Molecular Biology 340(3), 491–512 (2004)
Lin, F.P.Y., Coiera, E., Lan, R., Sintchenko, V.: In silico prioritisation of candidate genes for prokaryotic gene function discovery: an application of phylogenetic profiles. BMC Bioinformatics 10, 86 (2009), doi:10.1186/1471-2105-10-86
Gupta, G., Liu, A., Ghost, J.: Automated Hierarchical Density Shaving: A Robust Automated Clustering and Visualization Framework for Large Biological Data Sets. IEEE Transactions on Computational Biology and Bioinformatics 7(2), 223–237 (2010)
Psomopoulos, F., Mitkas, P.: Multi Level Clustering of Phylogenetic Profiles. In: Proceedings of the IEEE International Conference on Bioinformatics and Bioengineering (BIBE 2010), pp. 308–309 (2010)
Enright, A.J., Van Dongen, S., Ouzounis, C.A.: An efficient algorithm for large-scale detection of protein families. Nucleic Acids Research 30(7), 1575–1584 (2002)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B (Methodological) 39(1), 1–38 (1977)
van Dongen, S.: Graph Clustering by Flow Simulation, PhD thesis, University of Utrecht (May 2000), http://www.library.uu.nl/digiarchief/dip/diss/1895620/inhoud.html , http://micans.org/mcl
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009), http://www.cs.waikato.ac.nz/ml/weka
Matlab Bioinformatics Toolbox, http://www.mathworks.com/products/bioinfo
Matlab MCR Runtime Engine, http://www.mathworks.com/products/compiler
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 IFIP International Federation for Information Processing
About this paper
Cite this paper
Vitsios, D.M., Psomopoulos, F.E., Mitkas, P.A., Ouzounis, C.A. (2012). Multi-genome Core Pathway Identification through Gene Clustering. In: Iliadis, L., Maglogiannis, I., Papadopoulos, H., Karatzas, K., Sioutas, S. (eds) Artificial Intelligence Applications and Innovations. AIAI 2012. IFIP Advances in Information and Communication Technology, vol 382. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33412-2_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-33412-2_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33411-5
Online ISBN: 978-3-642-33412-2
eBook Packages: Computer ScienceComputer Science (R0)