Machine learning tools, in particular support vector machines (SVM), Particle Swarm Optimisation (PSO) and Genetic Programming (GP), are increasingly used in pharmaceuticals research and development. They are inherently suitable for use with ‘noisy’, high dimensional (many variables) data, as is commonly used in cheminformatic (i.e. In silico screening), bioinformatic (i.e. bio-marker studies, using DNA chip data) and other types of drug research studies. These aspects are demonstrated via review of their current usage and future prospects in context with drug discovery activities.
- Support Vector Machine
- Particle Swarm Optimiza
- Feature Selection
- Particle Swarm
- Drug Discovery
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Tax calculation will be finalised at checkout
Purchases are for personal use onlyLearn about institutional subscriptions
Unable to display preview. Download preview PDF.
Agrafiotis and Cedeno, 2002. Feature selection for structure-activity correlation using binary particle swarms. Journal of Medicinal Chemistry, 45(5): 1098–1107.
Amboise and McLachlan 2002. selection bias in gene extraction on the basis of micro array gene-expression data. PNAS, 99(10):6562–6566
Ando and Iba, 2004. Classification of gene expression profile using combinatory method of evolutionary computation and machine learning. GP&EM, 5(2): 145–156.
Arimoto and Gifford, 2005. Development of CYP3A4 Inhibition Models: Comparisons of Machine-Learning Techniques and Molecular Descriptors. Journal of Biomolecular Screening, 10(3):197–205
Bains et al., 2004. HERG binding specificity and binding site structure: Evidence from a fragment-based evolutionary computing SAR study. Progress in Biophysics and Molecular Biology, 86(2):205–233.
Banzhaf, et al., 1998. Genetic Programming An Introduction; On the Automatic Evolution of Computer Programs and its Applications; Morgan Kaufmann.
Bao and Sun, 2002. Identifying genes related to drug anticancer mechanisms using support vector machine. FEBS Lett. 521(1–3):109–14.
Barrett, SJ. (2005) INTErSECT “RoCKET”: Robust Classification and Knowledge Engineering Techniques. Presented at: ‘Through Collaboration to Innovation’, Centre for Advanced Instrumentation Systems, UCL, 16th February 2005.
Bhasin and Raghava, 2004a. GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic acids research, 32:W383–W389
Bhasin and Raghava, 2004b. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biological Chemistry, 279(22):23262–23266
Biesheuvel, 2005. Diagnostic Research: improvements in design and analysis. PhD thesis, Universiteit Utrecht, Holland.
Bock and Gough, 2003. Whole-proteome interaction mining. Bioinformatics, 19(1), 125–135.
Boser et al., 1992. A training algorithm for optimal margin classifiers. 5th Annual ACM Workshop, COLT, 1992
Breiman, 2001. Random forests. Machine Learning, 45:5–32
Breneman 2002. Caco-2 Permeability Modeling: Feature Selection via Sparse Support Vector Machines.Presented at the ADMEffox symposium at the Orlando ACS meeting, ApriI2002.
Brown et al., 2000. Knowledge-based analysis of micro array gene expression data by using support vector machines. Proc. Natl, Acad. Sci., USA 97:262–267
Burbidge et al., 2001a. STAR Sparsity Through Automated Rejection. In Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence: 6th International Work Conference On Artificial and Natural Neural Networks, IWANN 2001, Proceedings, Part 1, Vol. 2084; Mira, J.; Prieto, A., Eds.; Springer: Granada, Spain, 2001.
Burbidge et al., 2001b. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers in chemistry, 26(1):4–15
Butte, 2002. The use and analysis of micro array data. Nat. Rev. Drug Discov. 1(12):951–60
Byvatov, and Schneider, 2004. SVM-Based Feature Selection for Characterization of Focused Compound Collections. J. Chern. Inf. Comput. Sci., 44(3): 993–999
Byvatov et al., 2005a. From Virtual to Real Screening for D3 Dopamine Receptor Ligands. ChemBioChem, 6(6):997–999
Cedeno and Agrafiotis, 2003. Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression. J.Comput.-Aided Mol. Des.,17:255–263.
Chen, 2004. Support vector machine in chemistry. World Scientific, ISBN 9812389229
Cheng et al., 2004. Insight into the Bioactivity and Metabolism of Human Glucagon Receptor Antagonists from 3D-QSAR Analyses. QSAR & Combinatorial Science, 23(8): 603–620
Congdon and Septor, 2003. Phylogenetic trees using evolutionary search: Initial progress in extending gaphyl to work with genetic data. CEC, pp320–326.
Cristianini and Shawe-Taylor, 2000. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press ISBN: 0 521 78019 5
Deutsch, 2003. Evolutionary algorithms for finding optimal gene sets in micro array prediction. Bioinformatics, 19(1):45–52.
Dobson & Doig 2005. Predicting enzyme class from protein structure without alignments. J. Mol. Biol., 345:187–199
Doniger et al., 2002. Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms. J. of Computational Biol., 9(6): 849–864
Dubey et al., 2005. Support vector machines for learning to identify the critical positions of a protein. Journal of Theoretical Biology, 234(3):351–361
Fradkin, 2005. SVM in Analysis of Cross-Sectional Epidemiological Data. http://dimacs. rutgers. edu/SpecialYears/2002_EpidlEpidSeminarSlides/fradkin.pdf
Eberhart and Hu, 1999. Human tremor analysis using particle swarm optimization. In CEC, pp1927–1930
Eberhart, Kennedy and Shi, 2001, Swarm Intelligence, Morgan Kaufmann.
Fujarewicz and Wiench, 2003. Selecting differentially expressed genes for colon tumor classification. Int. J. Appl. Math. Comput. Sci., 13(3):327–335
Fung and Mangasarian, 2004. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2): 185–202
Furlanello et al., 2003. Entropy-Based Gene Ranking without Selection Bias for the Predictive Classification of Microarray Data. BMC Bioinformatics, 4:54–74.
Guo et al., 2005. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Engng., Design & Selection, 18(2):65–70
Guyon et al., 2002. Gene selection for cancer classification using support vector machines. Machine learning, 46(1–3):389–422
Hand, 1999. Statistics and data mining: intersecting disciplines. SIGKDD Explorations, 1: 16–19
Härdle and Moro, 2004. Survival Analysis with Support vector Machines. Talk at Universite Rene Descartes UFR Biomedicale, Paris http://appel.rz.hu-berlin.de/Zope/ise_stat/wiwi/ ise/stat/personenlwh/talks/hae_mor_SVM_%20survival040324.pdf
Heddad et al., 2004. Evolving regular expression-based sequence classifiers for protein nuclear localisation.In: Raidl, et al.eds., Applications of Evolutionary Computing,LNCS 3005, 31–40
Hong and Cho, 2004. Lymphoma cancer classification using genetic programming with SNR features. In Keijzer, et aleeds., EuroGP, LNCS 3003, 78–88.
Hou and Xu, 2004. Recent development and application of virtual screening in drug discovery: an overview. Current Pharmaceutical Design, 10: 1011–1033
Howard and Benson, 2003. Evolutionary computation method for pattern recognition of cisacting sites. Biosystems, 72(1–2):19–27.
Howley and Madden, 2005. The Genetic Kernel Support Vector Machine: Description and Evaluation”. Artificial Intelligence Review, to appear.
Huang and Chen, 2005. Support vector machines in sonography: Application to decision making in the diagnosis of breast cancer. Clinical Imaging, 29(3):179–184
Igel, 2005. Multiobjective Model Selection for Support Vector Machines. In C. A. Coello Coello, E. Zitzler, and A. Hernandez Aguirre, editors, Proc. of the Third International Conference on Evolutionary Multi-Criterion Optimization (EMO 2005), LNCS 3410: 534–546
Jerebko, et al., 2005. Support vector machines committee classification method for computeraided polyp detection in CT colonography. Acad. Radiol., 12(4): 479–486.
Johnson et al., 2003. Metabolic fingerprinting of salt-stressed tomatoes. Phytochemistry, 62(6): 919–928.
Jones, 1999. Genetic and evolutionary algorithms, in: Encyclopedia of Computational Chemistry, Wiley.
Jong et al., 2004. Analysis of Proteomic Pattern Data for Cancer Detection. In Applications of Evolutionary Computing. EvoBIO: Evolutionary Computation and Bioinformatics. Springer, 2004. LNCS, 3005: 41–51
Jorissen and Gilson, 2005. Virtual Screening of Molecular Databases Using a Support Vector Machine. J. Chern. Inf. Model, 45(3): 549–561
Kell, 2002. Defence against the flood. Bioinformatics World, pp16–18.
Kim et al., 2004. Prediction of phosphorylation sites using SVMs. Bioinformatics, 20: 3179–3184.
Kless and Eitrich, 2004. Cytochrome P450 Classification of Drugs with Support Vector Machines Implementing the Nearest Point Algorithm. LNAI, 3303:191–205
Koza, 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press
Koza et al., 2001. Reverse engineering of metabolic pathways from observed data using genetic programming. Pac. Symp. Biocomp, 2001, 434–435.
Langdon and Barrett, 2004. Genetic programming in data mining for drug discovery. In Ghosh and Jain, eds., Evolutionary Computing in Data Mining, pp211–235. Springer.
Langdon et al., 2001. Genetic programming for combining neural networks for drug discovery. In Roy, et al. eds., Soft Computing and Industry Recent Applications, 597–608. Springer. Published 2002.
Langdon et al., 2002. Combining decision trees and neural networks for drug discovery. In Foster, et al. eds., EuroGP, LNCS 2278, 60–70.
Langdon et al., 2003a. Comparison of AdaBoost and genetic programming for combining neural networks for drug discovery. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 2611, pp87–98.
Li et al., 2005. Degree prediction of malignancy in brain glioma using support vector machines. Computers in Biology and Medicine, In Press.
Li et al., 2005b. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics, 85(1): 16–23.
Lin et al., 2005. Piecewise hypersphere modeling by particle swarm optimization in QSAR studies of bioactivities of chemical compounds. J. Chern. Inf. Model., 45(3):535–541.
Listgarten et al., 2004. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clin. Cancer Res., 10: 2725–2737
Liu et al., 2004. QSAR and classification models of a novel series of COX-2 selective inhibitors: 1, 5-diarylimidazoles based on support vector machines. Journal of Computer-Aided Molecular Design 18(6): 389–399
Liu et al., 2005. Preclinical in vitro screening assays for drug-like properties. Drug Discovery Today: Technologies, 2(2): 179–185
Lu et al., 2004. QSAR analysis of cyclooxygenase inhibitor using particle swarm optimization and multiple linear regression. J. Pharm. Biomed. Anal., 35:679–687.
Malossini et al., 2004. Assessment of SVM reliability for microarrays data analysis. In: proc. 2nd European Workshop on data mining and text mining for bioinformatics, Pisa, Italy, Sept. 2004.
Merkwirth et al., 2004. Ensemble Methods for Classification in Cheminformatics. J. Chern. Inf. Comput. Sci., 44(6): 1971–1978
Miwakeichi et al., 2001. A comparison of non-linear non-parametric models for epilepsy data. Computers in Biology and Medicine, 31(1): 41–57
Moore and Hahn, 2004. An improved grammatical evolution strategy for hierarchical petri net modeling of complex genetic systems. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 3005, pp63–72.
Moore et al., 2002. Symbolic discriminant analysis of microarray data in automimmune disease. Genetic Epidemiology, 23:57–69.
Muchnik, 2004. Influences on Breast Cancer Survival via SVM Classification in the SEER Database. http://dimacs.rutgers.edu/Events/2004/abstracts/muchnik.htmI
Ng, 2004. Drugs-From Discovery to Approval. Wiley, New Jersey. ISBN: 0-471-60150-0
Nicolott i et al., 2002. Multiob jective optimization in quantitative structure-activity relationships: Deriving accurate and interpretable QSARs. Journal of Medicinal Chemistry, 45(23):5069–5080.
Norinder, 2003. Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing, 55(1–2): 337–346
Ooi and Tan, 2003. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 19(1):37–44.
Prados et al., 2004. Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents Proteomics, 4(8): 2320–2332
Ratti and Trist, 2001. Continuing evolution of the drug discovery process in the pharmaceutical industry. Pure Appl. Chern.. 73(1):67–75
Reif et al., 2004. Integrated analysis of genetic, genomic, and proteomic data. Expert Review of Proteomics, 1(1):67–75.
Roses, 2002. Genome-based pharmacogenetics and the pharmaceutical industry. Nat. Rev. Drug Discov. 1(7):541–9
Runarsson and Sigurdsson, 2004. Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing — Lett. & Reviews, 3(3):59–67
Saigo et al., 2004. Protein homology detection using string alignment kernels Bioinformatics, 20: 1682–1689.
Schneider and Fechner, 2004. Advances in the prediction of protein targeting signals Proteomics, 4(6): 1571–1580
Schneider & Fechner, 2005. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discovery, 4(8):649–663
Schrattenholz, 2004. Proteomics: how to control highly dynamic patterns of millions of molecules and interpret changes correctly? Drug Discovery Today: Technologies, 1(1): 1–8
Sebag et al., 2004. ROC-based Evolutionary Learning: Application to Medical Data Mining. Artificial Evolution’ 03, 384–396 Springer-verlag, LNCS
Seike, et al., 2004. Proteomic signature of human cancer cells. Proteomics, 4(9): 2776–2788
Shawe-Taylor and Cristianini, 2000. An introduction to support vector machines. CUP.
Shen et al., 2004. Hybridized particle swarm algorithm for adaptive structure training of multilayer feed-forward neural network: QSAR studies of bioactivity of organic compounds. Journal of Computational Chemistry, 25:1726–1735.
Shyu et al., 2004. Multiple sequence alignment with evolutionary computation. GP&EM, 5(2): 121–144.
Simek et al., 2004. Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data. Engineering Applications of Artificial Intelligence, 17: 417–427
Smits et al., 2005. Variable selection in industrial datasets using pareto genetic programming. In Yu, et al. eds., Genetic Programming Theory and Practice III. Kluwer.
Solmajer and Zupan, 2004. Optimisation algorithms and natural computing in drug discovery. Drug Discovery Today: Technologies, 1(3): 247–252
Suwa et al., 2004. GPCR and G-protein Coupling Selectivity Prediction Based on SVM with Physico-Chemical Parameters. GIW 2004 Poster Abstract: P056. http://www.jsbi.org/ journaIlGIW04/GIW04Poster.html
Takahashi et al., 2005. Identification of Dopamine Dl Receptor Agonists and Antagonists under Existing Noise Compounds by TFS-based ANN and SVM. J. Comput. Chern. Jpn., 4(2): 43–48
Takaoka et al., 2003. Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists’ Intuition. J. Chern. Inf. Comput. Sci., 43(4): 1269–1275.
Teramoto et al., 2005. Prediction of siRNA functionality using generalized string kernel and support vector machine. FEBS Lett. 579(13):2878–82
Thukral et al., 2005. Prediction of Nephrotoxicant Action and Identification of Candidate Toxicity-Related Biomarkers. Toxicologic Pathology, 33(3): 343–355
Tobita et al., 2005. A discriminant model constructed by the support vector machine method for HERG potassium channel inhibitors Bioorganic & Medicinal Chemistry Letters, 15:2886–2890
Vinayagam et al., 2004. Appplying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics. 5:116–129
Tsai and Wang, 2005. Evolutionary optimization with data collocation for reverse engineering of biological networks. Bioinformatics, 21(7): 1180–1188.
Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: New York, 1995.
Wachowiak et al., 2004. An approach to multimodal biomedical image registration utilizing particle swarm optimization. IEEE Trans on EC, 8(3):289–301.
Wang et al., 2004. Particle swarm optimization and neural network application for QSAR. In HiCOMB.
Wang et al., 2005. Gene selection from micro array data for cancer classification — a machine learning approach. Computational Biology and Chemistry, 29(1): 37–46
Warmuth et al., 2003. Active Learning with Support Vector Machines in the Drug Discovery Process. J. Chern. Inf. Comput. Sci., 43(2): 667–673
Watkins and German, 2002. Metabolomics and biochemical profiling in drug discovery and development. Curro Opin. Mol. Ther., 4(3): 224–8
Xiao et al., 2003. Gene clustering using self-organizing maps and particle swarm optimization. In HiCOMB
Xu and Hagler 2002. Chemoinformatics and drug discovery. Molecules, 7: 566–600
Xue et al., 2004a. Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach. J. Chern. Inf. Comput. Sci. 44(4): 1497–1505
Xue, et al., 2004b. QSAR Models for the Prediction of Binding Affinities to Human Serum Albumin Using the Heuristic Method and a Support Vector Machine. J. Chern. Inf. Comput. Sci., 44(5): 1693–1700
Yang and Chou, 2004. Bio-support vector machines for computational proteomics. Bioinformatics, 20: 735–741.
Yap and Chen, 2005. Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines. J. Chern. Inf. Model, To appear.
Yap et al., 2004. Prediction of Torsade-Causing Potential of Drugs by Support Vector Machine Approach. Toxicol. Sci., 79: 170–177
Yoon et al., 2003. Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines. Clinical Chemistry and Laboratory Medicine, 41(4): 529–534.
Zhao et al., 2004. Diagnosing anorexia based on partial least squares, back-propagation neural network, and support vector machines. J. Chern. Inf. Sci. 44, 2040–2046.
Editors and Affiliations
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barrett, S.J., Langdon, W.B. (2006). Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. In: Tiwari, A., Roy, R., Knowles, J., Avineri, E., Dahal, K. (eds) Applications of Soft Computing. Advances in Intelligent and Soft Computing, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36266-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29123-7
Online ISBN: 978-3-540-36266-1