Abstract
Efficient in silico screening approaches may provide valuable hints on biological functions of the compound-candidates, which could help to screen functional compounds either in basic researches on metabolic pathways or drug discovery. Here, we introduce a machine learning method (Nearest Neighbor Algorithm) based on functional group composition of compounds to the analysis of metabolic pathways. This method can quickly map small chemical molecules to the metabolic pathway that they likely belong to. A set of 2,764 compounds from 11 major classes of metabolic pathways were selected for study. The overall prediction rate reached 73.3%, indicating that functional group composition of compounds was really related to their biological metabolic functions.
Similar content being viewed by others
References
Nicholson JK, Connelly J, Lindon JC, Holmes E (2002) Metabonomics: a platform for studying drug toxicity and gene function. Nat Rev Drug Discov 1:153–161. doi:10.1038/nrd728
Nicholson JK,Wilson ID (2003) Opinion: understanding ‘global’ systems biology: metabonomics and the continuum of metabolism. Nat Rev Drug Discov 2:668–676. doi:10.1038/nrd1157
Nicholson JK, Lindon JC, Holmes E (1999) ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica 29: 1181–1189. doi:10.1080/004982599238047
Nicholson JK, Holmes E, Lindon JC,Wilson ID (2004) The challenges of modeling mammalian biocomplexity. Nat Biotechnol 22:1268--1274. doi:10.1038/nbt1015
Vastrik I, D’Eustachio P, Schmidt E, Joshi-Tope G, Gopinath G, Croft D et al (2007) Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8:R39. doi:10.1186/gb-2007-8-3-r39
Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M (2006) From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 34(database issue): D354–D357
Burkart MD (2003) Metabolic engineering—a genetic toolbox for small molecule organic synthesis. Org Biomol Chem 1:1–4. doi:10.1039/b210173d
Marchand-Geneste N,Watson KA, Alsberg BK, King RD (2002) New approach to pharmacophore mapping and QSAR analysis using inductive logic programming. Application to thermolysin inhibitors and glycogen phosphorylase B inhibitors. J Med Chem 45:399–409. doi:10.1021/jm0155244
Cai YD, Chou KC (2005) Predicting enzyme subclass by functional domain composition and pseudo amino acid composition. J Proteome Res 4:967–971. doi:10.1021/pr0500399
Cai YD, Doig AJ (2004) Prediction of Saccharomyces cerevisiae protein functional class from functional domain composition. Bioinformatics 20:1292–1300. doi:10.1093/bioinformatics/bth085
Salzberg S, Cost S (1992) Predicting protein secondary structure with a nearest-neighbor algorithm. J Mol Biol 227:371–374. doi:10.1016/0022–2836(92)90892-N
Jia P, Qian Z, Zeng Z, Cai Y, Li Y (2007) Prediction of subcellular protein localization based on functional domain composition. Biochem Biophys Res Commun 357:366–370. doi:10.1016/ j.bbrc.2007.03.139
Lu L, Qian Z, Cai YD, Li Y (2007) ECS: an automatic enzyme classifier based on functional domain composition. Comput Biol Chem 31:226–232. doi:10.1016/j.compbiolchem.2007.03.008
Chou KC, Cai YD (2005) Predicting protein localization in budding yeast. Bioinformatics 21:944–950. doi:10.1093/ bioinformatics/bti104
Qian Z, Lu L, Liu X, Cai Y-D, Li Y (2007) An approach to predict transcription factor DNA binding site specificity based upon gene and transcription factor functional categorization. Bioinformatics 23:2449–2454. doi:10.1093/bioinformatics/btm348
Mondal S, Bhavna R, Mohan Babu R, Ramakumar S (2006) Pseudo amino acid composition and multi-class support vector machines approach for conotoxin superfamily classification. J Theor Biol 243:252–260. doi:10.1016/j.jtbi.2006.06.014
Wong YH, Lee TY, Liang HK, Huang CM, Wang TY, Yang YH, Chu CH, Huang HD, Ko MT, Hwang JK (2007) KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylation sites based on sequences and coupling patterns. Nucleic Acids Res 35(Web Server issue): W588–W594
Xie D, Li A, Wang M, Fan Z, Feng H (2005) LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Res 33(Web Server issue): W105–W110
Trudy McKee JRM (1999) Biochemistry: an introduction. 2nd edn. McGraw-Hill Companies, Inc
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
The Below is the Electronic Supplementary Material.
Rights and permissions
About this article
Cite this article
Cai, YD., Qian, Z., Lu, L. et al. Prediction of compounds’ biological function (metabolic pathways) based on functional group composition. Mol Divers 12, 131–137 (2008). https://doi.org/10.1007/s11030-008-9085-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-008-9085-9