Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development

Barrett, S. J.; Langdon, W. B.

doi:10.1007/978-3-540-36266-1_10

S. J. Barrett⁶ &
W. B. Langdon⁷

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 36))

1152 Accesses
16 Citations

Abstract

Machine learning tools, in particular support vector machines (SVM), Particle Swarm Optimisation (PSO) and Genetic Programming (GP), are increasingly used in pharmaceuticals research and development. They are inherently suitable for use with ‘noisy’, high dimensional (many variables) data, as is commonly used in cheminformatic (i.e. In silico screening), bioinformatic (i.e. bio-marker studies, using DNA chip data) and other types of drug research studies. These aspects are demonstrated via review of their current usage and future prospects in context with drug discovery activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agrafiotis and Cedeno, 2002. Feature selection for structure-activity correlation using binary particle swarms. Journal of Medicinal Chemistry, 45(5): 1098–1107.
Article Google Scholar
Amboise and McLachlan 2002. selection bias in gene extraction on the basis of micro array gene-expression data. PNAS, 99(10):6562–6566
Article Google Scholar
Ando and Iba, 2004. Classification of gene expression profile using combinatory method of evolutionary computation and machine learning. GP&EM, 5(2): 145–156.
Google Scholar
Arimoto and Gifford, 2005. Development of CYP3A4 Inhibition Models: Comparisons of Machine-Learning Techniques and Molecular Descriptors. Journal of Biomolecular Screening, 10(3):197–205
Article Google Scholar
Bains et al., 2004. HERG binding specificity and binding site structure: Evidence from a fragment-based evolutionary computing SAR study. Progress in Biophysics and Molecular Biology, 86(2):205–233.
Article Google Scholar
Banzhaf, et al., 1998. Genetic Programming An Introduction; On the Automatic Evolution of Computer Programs and its Applications; Morgan Kaufmann.
Google Scholar
Bao and Sun, 2002. Identifying genes related to drug anticancer mechanisms using support vector machine. FEBS Lett. 521(1–3):109–14.
Article Google Scholar
Barrett, SJ. (2005) INTErSECT “RoCKET”: Robust Classification and Knowledge Engineering Techniques. Presented at: ‘Through Collaboration to Innovation’, Centre for Advanced Instrumentation Systems, UCL, 16th February 2005.
Google Scholar
Bhasin and Raghava, 2004a. GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic acids research, 32:W383–W389
Article Google Scholar
Bhasin and Raghava, 2004b. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biological Chemistry, 279(22):23262–23266
Article Google Scholar
Biesheuvel, 2005. Diagnostic Research: improvements in design and analysis. PhD thesis, Universiteit Utrecht, Holland.
Google Scholar
Bock and Gough, 2003. Whole-proteome interaction mining. Bioinformatics, 19(1), 125–135.
Article Google Scholar
Boser et al., 1992. A training algorithm for optimal margin classifiers. 5th Annual ACM Workshop, COLT, 1992
Google Scholar
Breiman, 2001. Random forests. Machine Learning, 45:5–32
Article MATH Google Scholar
Breneman 2002. Caco-2 Permeability Modeling: Feature Selection via Sparse Support Vector Machines.Presented at the ADMEffox symposium at the Orlando ACS meeting, ApriI2002.
Google Scholar
Brown et al., 2000. Knowledge-based analysis of micro array gene expression data by using support vector machines. Proc. Natl, Acad. Sci., USA 97:262–267
Article Google Scholar
Burbidge et al., 2001a. STAR Sparsity Through Automated Rejection. In Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence: 6th International Work Conference On Artificial and Natural Neural Networks, IWANN 2001, Proceedings, Part 1, Vol. 2084; Mira, J.; Prieto, A., Eds.; Springer: Granada, Spain, 2001.
Google Scholar
Burbidge et al., 2001b. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers in chemistry, 26(1):4–15
Google Scholar
Butte, 2002. The use and analysis of micro array data. Nat. Rev. Drug Discov. 1(12):951–60
Article Google Scholar
Byvatov, and Schneider, 2004. SVM-Based Feature Selection for Characterization of Focused Compound Collections. J. Chern. Inf. Comput. Sci., 44(3): 993–999
Google Scholar
Byvatov et al., 2005a. From Virtual to Real Screening for D3 Dopamine Receptor Ligands. ChemBioChem, 6(6):997–999
Article Google Scholar
Cedeno and Agrafiotis, 2003. Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression. J.Comput.-Aided Mol. Des.,17:255–263.
Article Google Scholar
Chen, 2004. Support vector machine in chemistry. World Scientific, ISBN 9812389229
Google Scholar
Cheng et al., 2004. Insight into the Bioactivity and Metabolism of Human Glucagon Receptor Antagonists from 3D-QSAR Analyses. QSAR & Combinatorial Science, 23(8): 603–620
Article Google Scholar
Congdon and Septor, 2003. Phylogenetic trees using evolutionary search: Initial progress in extending gaphyl to work with genetic data. CEC, pp320–326.
Google Scholar
Cristianini and Shawe-Taylor, 2000. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press ISBN: 0 521 78019 5
Google Scholar
Deutsch, 2003. Evolutionary algorithms for finding optimal gene sets in micro array prediction. Bioinformatics, 19(1):45–52.
Article Google Scholar
Dobson & Doig 2005. Predicting enzyme class from protein structure without alignments. J. Mol. Biol., 345:187–199
Article Google Scholar
Doniger et al., 2002. Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms. J. of Computational Biol., 9(6): 849–864
Article Google Scholar
Dubey et al., 2005. Support vector machines for learning to identify the critical positions of a protein. Journal of Theoretical Biology, 234(3):351–361
Article MathSciNet Google Scholar
Fradkin, 2005. SVM in Analysis of Cross-Sectional Epidemiological Data. http://dimacs. rutgers. edu/SpecialYears/2002_EpidlEpidSeminarSlides/fradkin.pdf
Google Scholar
Eberhart and Hu, 1999. Human tremor analysis using particle swarm optimization. In CEC, pp1927–1930
Google Scholar
Eberhart, Kennedy and Shi, 2001, Swarm Intelligence, Morgan Kaufmann.
Google Scholar
Fujarewicz and Wiench, 2003. Selecting differentially expressed genes for colon tumor classification. Int. J. Appl. Math. Comput. Sci., 13(3):327–335
MATH MathSciNet Google Scholar
Fung and Mangasarian, 2004. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2): 185–202
Article MATH MathSciNet Google Scholar
Furlanello et al., 2003. Entropy-Based Gene Ranking without Selection Bias for the Predictive Classification of Microarray Data. BMC Bioinformatics, 4:54–74.
Article Google Scholar
Guo et al., 2005. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Engng., Design & Selection, 18(2):65–70
Article Google Scholar
Guyon et al., 2002. Gene selection for cancer classification using support vector machines. Machine learning, 46(1–3):389–422
Article Google Scholar
Hand, 1999. Statistics and data mining: intersecting disciplines. SIGKDD Explorations, 1: 16–19
Article Google Scholar
Härdle and Moro, 2004. Survival Analysis with Support vector Machines. Talk at Universite Rene Descartes UFR Biomedicale, Paris http://appel.rz.hu-berlin.de/Zope/ise_stat/wiwi/ ise/stat/personenlwh/talks/hae_mor_SVM_%20survival040324.pdf
Google Scholar
Heddad et al., 2004. Evolving regular expression-based sequence classifiers for protein nuclear localisation.In: Raidl, et al.eds., Applications of Evolutionary Computing,LNCS 3005, 31–40
Google Scholar
Hong and Cho, 2004. Lymphoma cancer classification using genetic programming with SNR features. In Keijzer, et aleeds., EuroGP, LNCS 3003, 78–88.
Google Scholar
Hou and Xu, 2004. Recent development and application of virtual screening in drug discovery: an overview. Current Pharmaceutical Design, 10: 1011–1033
Article Google Scholar
Howard and Benson, 2003. Evolutionary computation method for pattern recognition of cisacting sites. Biosystems, 72(1–2):19–27.
Article Google Scholar
Howley and Madden, 2005. The Genetic Kernel Support Vector Machine: Description and Evaluation”. Artificial Intelligence Review, to appear.
Google Scholar
Huang and Chen, 2005. Support vector machines in sonography: Application to decision making in the diagnosis of breast cancer. Clinical Imaging, 29(3):179–184
Article Google Scholar
Igel, 2005. Multiobjective Model Selection for Support Vector Machines. In C. A. Coello Coello, E. Zitzler, and A. Hernandez Aguirre, editors, Proc. of the Third International Conference on Evolutionary Multi-Criterion Optimization (EMO 2005), LNCS 3410: 534–546
Google Scholar
Jerebko, et al., 2005. Support vector machines committee classification method for computeraided polyp detection in CT colonography. Acad. Radiol., 12(4): 479–486.
Article Google Scholar
Johnson et al., 2003. Metabolic fingerprinting of salt-stressed tomatoes. Phytochemistry, 62(6): 919–928.
Article Google Scholar
Jones, 1999. Genetic and evolutionary algorithms, in: Encyclopedia of Computational Chemistry, Wiley.
Google Scholar
Jong et al., 2004. Analysis of Proteomic Pattern Data for Cancer Detection. In Applications of Evolutionary Computing. EvoBIO: Evolutionary Computation and Bioinformatics. Springer, 2004. LNCS, 3005: 41–51
Google Scholar
Jorissen and Gilson, 2005. Virtual Screening of Molecular Databases Using a Support Vector Machine. J. Chern. Inf. Model, 45(3): 549–561
Article Google Scholar
Kell, 2002. Defence against the flood. Bioinformatics World, pp16–18.
Google Scholar
Kim et al., 2004. Prediction of phosphorylation sites using SVMs. Bioinformatics, 20: 3179–3184.
Article Google Scholar
Kless and Eitrich, 2004. Cytochrome P450 Classification of Drugs with Support Vector Machines Implementing the Nearest Point Algorithm. LNAI, 3303:191–205
Google Scholar
Koza, 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press
Google Scholar
Koza et al., 2001. Reverse engineering of metabolic pathways from observed data using genetic programming. Pac. Symp. Biocomp, 2001, 434–435.
Google Scholar
Langdon and Barrett, 2004. Genetic programming in data mining for drug discovery. In Ghosh and Jain, eds., Evolutionary Computing in Data Mining, pp211–235. Springer.
Google Scholar
Langdon et al., 2001. Genetic programming for combining neural networks for drug discovery. In Roy, et al. eds., Soft Computing and Industry Recent Applications, 597–608. Springer. Published 2002.
Google Scholar
Langdon et al., 2002. Combining decision trees and neural networks for drug discovery. In Foster, et al. eds., EuroGP, LNCS 2278, 60–70.
Google Scholar
Langdon et al., 2003a. Comparison of AdaBoost and genetic programming for combining neural networks for drug discovery. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 2611, pp87–98.
Google Scholar
Li et al., 2005. Degree prediction of malignancy in brain glioma using support vector machines. Computers in Biology and Medicine, In Press.
Google Scholar
Li et al., 2005b. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics, 85(1): 16–23.
Article Google Scholar
Lin et al., 2005. Piecewise hypersphere modeling by particle swarm optimization in QSAR studies of bioactivities of chemical compounds. J. Chern. Inf. Model., 45(3):535–541.
Article Google Scholar
Listgarten et al., 2004. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clin. Cancer Res., 10: 2725–2737
Article Google Scholar
Liu et al., 2004. QSAR and classification models of a novel series of COX-2 selective inhibitors: 1, 5-diarylimidazoles based on support vector machines. Journal of Computer-Aided Molecular Design 18(6): 389–399
Article Google Scholar
Liu et al., 2005. Preclinical in vitro screening assays for drug-like properties. Drug Discovery Today: Technologies, 2(2): 179–185
Article Google Scholar
Lu et al., 2004. QSAR analysis of cyclooxygenase inhibitor using particle swarm optimization and multiple linear regression. J. Pharm. Biomed. Anal., 35:679–687.
Article Google Scholar
Malossini et al., 2004. Assessment of SVM reliability for microarrays data analysis. In: proc. 2nd European Workshop on data mining and text mining for bioinformatics, Pisa, Italy, Sept. 2004.
Google Scholar
Merkwirth et al., 2004. Ensemble Methods for Classification in Cheminformatics. J. Chern. Inf. Comput. Sci., 44(6): 1971–1978
Google Scholar
Miwakeichi et al., 2001. A comparison of non-linear non-parametric models for epilepsy data. Computers in Biology and Medicine, 31(1): 41–57
Google Scholar
Moore and Hahn, 2004. An improved grammatical evolution strategy for hierarchical petri net modeling of complex genetic systems. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 3005, pp63–72.
Google Scholar
Moore et al., 2002. Symbolic discriminant analysis of microarray data in automimmune disease. Genetic Epidemiology, 23:57–69.
Article Google Scholar
Muchnik, 2004. Influences on Breast Cancer Survival via SVM Classification in the SEER Database. http://dimacs.rutgers.edu/Events/2004/abstracts/muchnik.htmI
Google Scholar
Ng, 2004. Drugs-From Discovery to Approval. Wiley, New Jersey. ISBN: 0-471-60150-0
Google Scholar
Nicolott i et al., 2002. Multiob jective optimization in quantitative structure-activity relationships: Deriving accurate and interpretable QSARs. Journal of Medicinal Chemistry, 45(23):5069–5080.
Article Google Scholar
Norinder, 2003. Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing, 55(1–2): 337–346
Article Google Scholar
Ooi and Tan, 2003. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 19(1):37–44.
Article Google Scholar
Prados et al., 2004. Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents Proteomics, 4(8): 2320–2332
Article Google Scholar
Ratti and Trist, 2001. Continuing evolution of the drug discovery process in the pharmaceutical industry. Pure Appl. Chern.. 73(1):67–75
Article Google Scholar
Reif et al., 2004. Integrated analysis of genetic, genomic, and proteomic data. Expert Review of Proteomics, 1(1):67–75.
Article MathSciNet Google Scholar
Roses, 2002. Genome-based pharmacogenetics and the pharmaceutical industry. Nat. Rev. Drug Discov. 1(7):541–9
Article Google Scholar
Runarsson and Sigurdsson, 2004. Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing — Lett. & Reviews, 3(3):59–67
Google Scholar
Saigo et al., 2004. Protein homology detection using string alignment kernels Bioinformatics, 20: 1682–1689.
Article Google Scholar
Schneider and Fechner, 2004. Advances in the prediction of protein targeting signals Proteomics, 4(6): 1571–1580
Article Google Scholar
Schneider & Fechner, 2005. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discovery, 4(8):649–663
Article Google Scholar
Schrattenholz, 2004. Proteomics: how to control highly dynamic patterns of millions of molecules and interpret changes correctly? Drug Discovery Today: Technologies, 1(1): 1–8
Article Google Scholar
Sebag et al., 2004. ROC-based Evolutionary Learning: Application to Medical Data Mining. Artificial Evolution’ 03, 384–396 Springer-verlag, LNCS
Google Scholar
Seike, et al., 2004. Proteomic signature of human cancer cells. Proteomics, 4(9): 2776–2788
Article Google Scholar
Shawe-Taylor and Cristianini, 2000. An introduction to support vector machines. CUP.
Google Scholar
Shen et al., 2004. Hybridized particle swarm algorithm for adaptive structure training of multilayer feed-forward neural network: QSAR studies of bioactivity of organic compounds. Journal of Computational Chemistry, 25:1726–1735.
Article Google Scholar
Shyu et al., 2004. Multiple sequence alignment with evolutionary computation. GP&EM, 5(2): 121–144.
Google Scholar
Simek et al., 2004. Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data. Engineering Applications of Artificial Intelligence, 17: 417–427
Article Google Scholar
Smits et al., 2005. Variable selection in industrial datasets using pareto genetic programming. In Yu, et al. eds., Genetic Programming Theory and Practice III. Kluwer.
Google Scholar
Solmajer and Zupan, 2004. Optimisation algorithms and natural computing in drug discovery. Drug Discovery Today: Technologies, 1(3): 247–252
Article Google Scholar
Suwa et al., 2004. GPCR and G-protein Coupling Selectivity Prediction Based on SVM with Physico-Chemical Parameters. GIW 2004 Poster Abstract: P056. http://www.jsbi.org/ journaIlGIW04/GIW04Poster.html
Google Scholar
Takahashi et al., 2005. Identification of Dopamine Dl Receptor Agonists and Antagonists under Existing Noise Compounds by TFS-based ANN and SVM. J. Comput. Chern. Jpn., 4(2): 43–48
Article Google Scholar
Takaoka et al., 2003. Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists’ Intuition. J. Chern. Inf. Comput. Sci., 43(4): 1269–1275.
MathSciNet Google Scholar
Teramoto et al., 2005. Prediction of siRNA functionality using generalized string kernel and support vector machine. FEBS Lett. 579(13):2878–82
Article Google Scholar
Thukral et al., 2005. Prediction of Nephrotoxicant Action and Identification of Candidate Toxicity-Related Biomarkers. Toxicologic Pathology, 33(3): 343–355
Article Google Scholar
Tobita et al., 2005. A discriminant model constructed by the support vector machine method for HERG potassium channel inhibitors Bioorganic & Medicinal Chemistry Letters, 15:2886–2890
Article Google Scholar
Vinayagam et al., 2004. Appplying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics. 5:116–129
Article Google Scholar
Tsai and Wang, 2005. Evolutionary optimization with data collocation for reverse engineering of biological networks. Bioinformatics, 21(7): 1180–1188.
Article Google Scholar
Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: New York, 1995.
MATH Google Scholar
Wachowiak et al., 2004. An approach to multimodal biomedical image registration utilizing particle swarm optimization. IEEE Trans on EC, 8(3):289–301.
MathSciNet Google Scholar
Wang et al., 2004. Particle swarm optimization and neural network application for QSAR. In HiCOMB.
Google Scholar
Wang et al., 2005. Gene selection from micro array data for cancer classification — a machine learning approach. Computational Biology and Chemistry, 29(1): 37–46
Article MATH Google Scholar
Warmuth et al., 2003. Active Learning with Support Vector Machines in the Drug Discovery Process. J. Chern. Inf. Comput. Sci., 43(2): 667–673
Google Scholar
Watkins and German, 2002. Metabolomics and biochemical profiling in drug discovery and development. Curro Opin. Mol. Ther., 4(3): 224–8
Google Scholar
Xiao et al., 2003. Gene clustering using self-organizing maps and particle swarm optimization. In HiCOMB
Google Scholar
Xu and Hagler 2002. Chemoinformatics and drug discovery. Molecules, 7: 566–600
Article Google Scholar
Xue et al., 2004a. Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach. J. Chern. Inf. Comput. Sci. 44(4): 1497–1505
Google Scholar
Xue, et al., 2004b. QSAR Models for the Prediction of Binding Affinities to Human Serum Albumin Using the Heuristic Method and a Support Vector Machine. J. Chern. Inf. Comput. Sci., 44(5): 1693–1700
Google Scholar
Yang and Chou, 2004. Bio-support vector machines for computational proteomics. Bioinformatics, 20: 735–741.
Article Google Scholar
Yap and Chen, 2005. Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines. J. Chern. Inf. Model, To appear.
Google Scholar
Yap et al., 2004. Prediction of Torsade-Causing Potential of Drugs by Support Vector Machine Approach. Toxicol. Sci., 79: 170–177
Article Google Scholar
Yoon et al., 2003. Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines. Clinical Chemistry and Laboratory Medicine, 41(4): 529–534.
Article Google Scholar
Zhao et al., 2004. Diagnosing anorexia based on partial least squares, back-propagation neural network, and support vector machines. J. Chern. Inf. Sci. 44, 2040–2046.
Google Scholar

Download references

Author information

Authors and Affiliations

Analysis Applications, Research and Technologies, GlaxoSmithKline R&D, Greenford Rd, Greenford, Middlesex, UB6 OHE, UK
S. J. Barrett
Computer Science, University of Essex, Colchester, C04 3SQ, UK
W. B. Langdon

Authors

S. J. Barrett
View author publications
You can also search for this author in PubMed Google Scholar
W. B. Langdon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Enterprise Integration School of Industrial and Manufacturing Science (SIMS), Cranfield University, Cranfield, Bedfordshire, MK43 0AL, UK
Ashutosh Tiwari & Rajkumar Roy &
School of Chemistry, University of Manchester, Faraday Building, Sackville Street, PO Box 88, Manchester, M6o 1QD, UK
Joshua Knowles
Centre for Transport & Society Faculty of the Built Environment, University of the West of England, Frenchay Campus Coldharbour Lane, Bristol, BS16 1QY, UK
Erel Avineri
School of Informatics, University of Bradford, Richmond Road, Bradford, West Yorkshire, BD7 1DP, UK
Keshav Dahal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barrett, S.J., Langdon, W.B. (2006). Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. In: Tiwari, A., Roy, R., Knowles, J., Avineri, E., Dahal, K. (eds) Applications of Soft Computing. Advances in Intelligent and Soft Computing, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36266-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-540-36266-1_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29123-7
Online ISBN: 978-3-540-36266-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics