Skip to main content

Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development

  • Conference paper
Applications of Soft Computing

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 36))

Abstract

Machine learning tools, in particular support vector machines (SVM), Particle Swarm Optimisation (PSO) and Genetic Programming (GP), are increasingly used in pharmaceuticals research and development. They are inherently suitable for use with ‘noisy’, high dimensional (many variables) data, as is commonly used in cheminformatic (i.e. In silico screening), bioinformatic (i.e. bio-marker studies, using DNA chip data) and other types of drug research studies. These aspects are demonstrated via review of their current usage and future prospects in context with drug discovery activities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agrafiotis and Cedeno, 2002. Feature selection for structure-activity correlation using binary particle swarms. Journal of Medicinal Chemistry, 45(5): 1098–1107.

    Article  Google Scholar 

  • Amboise and McLachlan 2002. selection bias in gene extraction on the basis of micro array gene-expression data. PNAS, 99(10):6562–6566

    Article  Google Scholar 

  • Ando and Iba, 2004. Classification of gene expression profile using combinatory method of evolutionary computation and machine learning. GP&EM, 5(2): 145–156.

    Google Scholar 

  • Arimoto and Gifford, 2005. Development of CYP3A4 Inhibition Models: Comparisons of Machine-Learning Techniques and Molecular Descriptors. Journal of Biomolecular Screening, 10(3):197–205

    Article  Google Scholar 

  • Bains et al., 2004. HERG binding specificity and binding site structure: Evidence from a fragment-based evolutionary computing SAR study. Progress in Biophysics and Molecular Biology, 86(2):205–233.

    Article  Google Scholar 

  • Banzhaf, et al., 1998. Genetic Programming An Introduction; On the Automatic Evolution of Computer Programs and its Applications; Morgan Kaufmann.

    Google Scholar 

  • Bao and Sun, 2002. Identifying genes related to drug anticancer mechanisms using support vector machine. FEBS Lett. 521(1–3):109–14.

    Article  Google Scholar 

  • Barrett, SJ. (2005) INTErSECT “RoCKET”: Robust Classification and Knowledge Engineering Techniques. Presented at: ‘Through Collaboration to Innovation’, Centre for Advanced Instrumentation Systems, UCL, 16th February 2005.

    Google Scholar 

  • Bhasin and Raghava, 2004a. GPCRpred: an SVM-based method for prediction of families and subfamilies of G-protein coupled receptors. Nucleic acids research, 32:W383–W389

    Article  Google Scholar 

  • Bhasin and Raghava, 2004b. Classification of nuclear receptors based on amino acid composition and dipeptide composition. J. Biological Chemistry, 279(22):23262–23266

    Article  Google Scholar 

  • Biesheuvel, 2005. Diagnostic Research: improvements in design and analysis. PhD thesis, Universiteit Utrecht, Holland.

    Google Scholar 

  • Bock and Gough, 2003. Whole-proteome interaction mining. Bioinformatics, 19(1), 125–135.

    Article  Google Scholar 

  • Boser et al., 1992. A training algorithm for optimal margin classifiers. 5th Annual ACM Workshop, COLT, 1992

    Google Scholar 

  • Breiman, 2001. Random forests. Machine Learning, 45:5–32

    Article  MATH  Google Scholar 

  • Breneman 2002. Caco-2 Permeability Modeling: Feature Selection via Sparse Support Vector Machines.Presented at the ADMEffox symposium at the Orlando ACS meeting, ApriI2002.

    Google Scholar 

  • Brown et al., 2000. Knowledge-based analysis of micro array gene expression data by using support vector machines. Proc. Natl, Acad. Sci., USA 97:262–267

    Article  Google Scholar 

  • Burbidge et al., 2001a. STAR Sparsity Through Automated Rejection. In Connectionist Models of Neurons, Learning Processes, and Artificial Intelligence: 6th International Work Conference On Artificial and Natural Neural Networks, IWANN 2001, Proceedings, Part 1, Vol. 2084; Mira, J.; Prieto, A., Eds.; Springer: Granada, Spain, 2001.

    Google Scholar 

  • Burbidge et al., 2001b. Drug design by machine learning: support vector machines for pharmaceutical data analysis. Computers in chemistry, 26(1):4–15

    Google Scholar 

  • Butte, 2002. The use and analysis of micro array data. Nat. Rev. Drug Discov. 1(12):951–60

    Article  Google Scholar 

  • Byvatov, and Schneider, 2004. SVM-Based Feature Selection for Characterization of Focused Compound Collections. J. Chern. Inf. Comput. Sci., 44(3): 993–999

    Google Scholar 

  • Byvatov et al., 2005a. From Virtual to Real Screening for D3 Dopamine Receptor Ligands. ChemBioChem, 6(6):997–999

    Article  Google Scholar 

  • Cedeno and Agrafiotis, 2003. Using particle swarms for the development of QSAR models based on K-nearest neighbor and kernel regression. J.Comput.-Aided Mol. Des.,17:255–263.

    Article  Google Scholar 

  • Chen, 2004. Support vector machine in chemistry. World Scientific, ISBN 9812389229

    Google Scholar 

  • Cheng et al., 2004. Insight into the Bioactivity and Metabolism of Human Glucagon Receptor Antagonists from 3D-QSAR Analyses. QSAR & Combinatorial Science, 23(8): 603–620

    Article  Google Scholar 

  • Congdon and Septor, 2003. Phylogenetic trees using evolutionary search: Initial progress in extending gaphyl to work with genetic data. CEC, pp320–326.

    Google Scholar 

  • Cristianini and Shawe-Taylor, 2000. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press ISBN: 0 521 78019 5

    Google Scholar 

  • Deutsch, 2003. Evolutionary algorithms for finding optimal gene sets in micro array prediction. Bioinformatics, 19(1):45–52.

    Article  Google Scholar 

  • Dobson & Doig 2005. Predicting enzyme class from protein structure without alignments. J. Mol. Biol., 345:187–199

    Article  Google Scholar 

  • Doniger et al., 2002. Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms. J. of Computational Biol., 9(6): 849–864

    Article  Google Scholar 

  • Dubey et al., 2005. Support vector machines for learning to identify the critical positions of a protein. Journal of Theoretical Biology, 234(3):351–361

    Article  MathSciNet  Google Scholar 

  • Fradkin, 2005. SVM in Analysis of Cross-Sectional Epidemiological Data. http://dimacs. rutgers. edu/SpecialYears/2002_EpidlEpidSeminarSlides/fradkin.pdf

    Google Scholar 

  • Eberhart and Hu, 1999. Human tremor analysis using particle swarm optimization. In CEC, pp1927–1930

    Google Scholar 

  • Eberhart, Kennedy and Shi, 2001, Swarm Intelligence, Morgan Kaufmann.

    Google Scholar 

  • Fujarewicz and Wiench, 2003. Selecting differentially expressed genes for colon tumor classification. Int. J. Appl. Math. Comput. Sci., 13(3):327–335

    MATH  MathSciNet  Google Scholar 

  • Fung and Mangasarian, 2004. A Feature Selection Newton Method for Support Vector Machine Classification. Computational Optimization and Applications 28(2): 185–202

    Article  MATH  MathSciNet  Google Scholar 

  • Furlanello et al., 2003. Entropy-Based Gene Ranking without Selection Bias for the Predictive Classification of Microarray Data. BMC Bioinformatics, 4:54–74.

    Article  Google Scholar 

  • Guo et al., 2005. A novel statistical ligand-binding site predictor: application to ATP-binding sites. Protein Engng., Design & Selection, 18(2):65–70

    Article  Google Scholar 

  • Guyon et al., 2002. Gene selection for cancer classification using support vector machines. Machine learning, 46(1–3):389–422

    Article  Google Scholar 

  • Hand, 1999. Statistics and data mining: intersecting disciplines. SIGKDD Explorations, 1: 16–19

    Article  Google Scholar 

  • Härdle and Moro, 2004. Survival Analysis with Support vector Machines. Talk at Universite Rene Descartes UFR Biomedicale, Paris http://appel.rz.hu-berlin.de/Zope/ise_stat/wiwi/ ise/stat/personenlwh/talks/hae_mor_SVM_%20survival040324.pdf

    Google Scholar 

  • Heddad et al., 2004. Evolving regular expression-based sequence classifiers for protein nuclear localisation.In: Raidl, et al.eds., Applications of Evolutionary Computing,LNCS 3005, 31–40

    Google Scholar 

  • Hong and Cho, 2004. Lymphoma cancer classification using genetic programming with SNR features. In Keijzer, et aleeds., EuroGP, LNCS 3003, 78–88.

    Google Scholar 

  • Hou and Xu, 2004. Recent development and application of virtual screening in drug discovery: an overview. Current Pharmaceutical Design, 10: 1011–1033

    Article  Google Scholar 

  • Howard and Benson, 2003. Evolutionary computation method for pattern recognition of cisacting sites. Biosystems, 72(1–2):19–27.

    Article  Google Scholar 

  • Howley and Madden, 2005. The Genetic Kernel Support Vector Machine: Description and Evaluation”. Artificial Intelligence Review, to appear.

    Google Scholar 

  • Huang and Chen, 2005. Support vector machines in sonography: Application to decision making in the diagnosis of breast cancer. Clinical Imaging, 29(3):179–184

    Article  Google Scholar 

  • Igel, 2005. Multiobjective Model Selection for Support Vector Machines. In C. A. Coello Coello, E. Zitzler, and A. Hernandez Aguirre, editors, Proc. of the Third International Conference on Evolutionary Multi-Criterion Optimization (EMO 2005), LNCS 3410: 534–546

    Google Scholar 

  • Jerebko, et al., 2005. Support vector machines committee classification method for computeraided polyp detection in CT colonography. Acad. Radiol., 12(4): 479–486.

    Article  Google Scholar 

  • Johnson et al., 2003. Metabolic fingerprinting of salt-stressed tomatoes. Phytochemistry, 62(6): 919–928.

    Article  Google Scholar 

  • Jones, 1999. Genetic and evolutionary algorithms, in: Encyclopedia of Computational Chemistry, Wiley.

    Google Scholar 

  • Jong et al., 2004. Analysis of Proteomic Pattern Data for Cancer Detection. In Applications of Evolutionary Computing. EvoBIO: Evolutionary Computation and Bioinformatics. Springer, 2004. LNCS, 3005: 41–51

    Google Scholar 

  • Jorissen and Gilson, 2005. Virtual Screening of Molecular Databases Using a Support Vector Machine. J. Chern. Inf. Model, 45(3): 549–561

    Article  Google Scholar 

  • Kell, 2002. Defence against the flood. Bioinformatics World, pp16–18.

    Google Scholar 

  • Kim et al., 2004. Prediction of phosphorylation sites using SVMs. Bioinformatics, 20: 3179–3184.

    Article  Google Scholar 

  • Kless and Eitrich, 2004. Cytochrome P450 Classification of Drugs with Support Vector Machines Implementing the Nearest Point Algorithm. LNAI, 3303:191–205

    Google Scholar 

  • Koza, 1992. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press

    Google Scholar 

  • Koza et al., 2001. Reverse engineering of metabolic pathways from observed data using genetic programming. Pac. Symp. Biocomp, 2001, 434–435.

    Google Scholar 

  • Langdon and Barrett, 2004. Genetic programming in data mining for drug discovery. In Ghosh and Jain, eds., Evolutionary Computing in Data Mining, pp211–235. Springer.

    Google Scholar 

  • Langdon et al., 2001. Genetic programming for combining neural networks for drug discovery. In Roy, et al. eds., Soft Computing and Industry Recent Applications, 597–608. Springer. Published 2002.

    Google Scholar 

  • Langdon et al., 2002. Combining decision trees and neural networks for drug discovery. In Foster, et al. eds., EuroGP, LNCS 2278, 60–70.

    Google Scholar 

  • Langdon et al., 2003a. Comparison of AdaBoost and genetic programming for combining neural networks for drug discovery. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 2611, pp87–98.

    Google Scholar 

  • Li et al., 2005. Degree prediction of malignancy in brain glioma using support vector machines. Computers in Biology and Medicine, In Press.

    Google Scholar 

  • Li et al., 2005b. A robust hybrid between genetic algorithm and support vector machine for extracting an optimal feature gene subset. Genomics, 85(1): 16–23.

    Article  Google Scholar 

  • Lin et al., 2005. Piecewise hypersphere modeling by particle swarm optimization in QSAR studies of bioactivities of chemical compounds. J. Chern. Inf. Model., 45(3):535–541.

    Article  Google Scholar 

  • Listgarten et al., 2004. Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms. Clin. Cancer Res., 10: 2725–2737

    Article  Google Scholar 

  • Liu et al., 2004. QSAR and classification models of a novel series of COX-2 selective inhibitors: 1, 5-diarylimidazoles based on support vector machines. Journal of Computer-Aided Molecular Design 18(6): 389–399

    Article  Google Scholar 

  • Liu et al., 2005. Preclinical in vitro screening assays for drug-like properties. Drug Discovery Today: Technologies, 2(2): 179–185

    Article  Google Scholar 

  • Lu et al., 2004. QSAR analysis of cyclooxygenase inhibitor using particle swarm optimization and multiple linear regression. J. Pharm. Biomed. Anal., 35:679–687.

    Article  Google Scholar 

  • Malossini et al., 2004. Assessment of SVM reliability for microarrays data analysis. In: proc. 2nd European Workshop on data mining and text mining for bioinformatics, Pisa, Italy, Sept. 2004.

    Google Scholar 

  • Merkwirth et al., 2004. Ensemble Methods for Classification in Cheminformatics. J. Chern. Inf. Comput. Sci., 44(6): 1971–1978

    Google Scholar 

  • Miwakeichi et al., 2001. A comparison of non-linear non-parametric models for epilepsy data. Computers in Biology and Medicine, 31(1): 41–57

    Google Scholar 

  • Moore and Hahn, 2004. An improved grammatical evolution strategy for hierarchical petri net modeling of complex genetic systems. In Raidl, et al. eds., Applications of Evolutionary Computing, LNCS 3005, pp63–72.

    Google Scholar 

  • Moore et al., 2002. Symbolic discriminant analysis of microarray data in automimmune disease. Genetic Epidemiology, 23:57–69.

    Article  Google Scholar 

  • Muchnik, 2004. Influences on Breast Cancer Survival via SVM Classification in the SEER Database. http://dimacs.rutgers.edu/Events/2004/abstracts/muchnik.htmI

    Google Scholar 

  • Ng, 2004. Drugs-From Discovery to Approval. Wiley, New Jersey. ISBN: 0-471-60150-0

    Google Scholar 

  • Nicolott i et al., 2002. Multiob jective optimization in quantitative structure-activity relationships: Deriving accurate and interpretable QSARs. Journal of Medicinal Chemistry, 45(23):5069–5080.

    Article  Google Scholar 

  • Norinder, 2003. Support vector machine models in drug design: applications to drug transport processes and QSAR using simplex optimisations and variable selection. Neurocomputing, 55(1–2): 337–346

    Article  Google Scholar 

  • Ooi and Tan, 2003. Genetic algorithms applied to multi-class prediction for the analysis of gene expression data. Bioinformatics, 19(1):37–44.

    Article  Google Scholar 

  • Prados et al., 2004. Mining mass spectra for diagnosis and biomarker discovery of cerebral accidents Proteomics, 4(8): 2320–2332

    Article  Google Scholar 

  • Ratti and Trist, 2001. Continuing evolution of the drug discovery process in the pharmaceutical industry. Pure Appl. Chern.. 73(1):67–75

    Article  Google Scholar 

  • Reif et al., 2004. Integrated analysis of genetic, genomic, and proteomic data. Expert Review of Proteomics, 1(1):67–75.

    Article  MathSciNet  Google Scholar 

  • Roses, 2002. Genome-based pharmacogenetics and the pharmaceutical industry. Nat. Rev. Drug Discov. 1(7):541–9

    Article  Google Scholar 

  • Runarsson and Sigurdsson, 2004. Asynchronous parallel evolutionary model selection for support vector machines. Neural Information Processing — Lett. & Reviews, 3(3):59–67

    Google Scholar 

  • Saigo et al., 2004. Protein homology detection using string alignment kernels Bioinformatics, 20: 1682–1689.

    Article  Google Scholar 

  • Schneider and Fechner, 2004. Advances in the prediction of protein targeting signals Proteomics, 4(6): 1571–1580

    Article  Google Scholar 

  • Schneider & Fechner, 2005. Computer-based de novo design of drug-like molecules. Nat. Rev. Drug Discovery, 4(8):649–663

    Article  Google Scholar 

  • Schrattenholz, 2004. Proteomics: how to control highly dynamic patterns of millions of molecules and interpret changes correctly? Drug Discovery Today: Technologies, 1(1): 1–8

    Article  Google Scholar 

  • Sebag et al., 2004. ROC-based Evolutionary Learning: Application to Medical Data Mining. Artificial Evolution’ 03, 384–396 Springer-verlag, LNCS

    Google Scholar 

  • Seike, et al., 2004. Proteomic signature of human cancer cells. Proteomics, 4(9): 2776–2788

    Article  Google Scholar 

  • Shawe-Taylor and Cristianini, 2000. An introduction to support vector machines. CUP.

    Google Scholar 

  • Shen et al., 2004. Hybridized particle swarm algorithm for adaptive structure training of multilayer feed-forward neural network: QSAR studies of bioactivity of organic compounds. Journal of Computational Chemistry, 25:1726–1735.

    Article  Google Scholar 

  • Shyu et al., 2004. Multiple sequence alignment with evolutionary computation. GP&EM, 5(2): 121–144.

    Google Scholar 

  • Simek et al., 2004. Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data. Engineering Applications of Artificial Intelligence, 17: 417–427

    Article  Google Scholar 

  • Smits et al., 2005. Variable selection in industrial datasets using pareto genetic programming. In Yu, et al. eds., Genetic Programming Theory and Practice III. Kluwer.

    Google Scholar 

  • Solmajer and Zupan, 2004. Optimisation algorithms and natural computing in drug discovery. Drug Discovery Today: Technologies, 1(3): 247–252

    Article  Google Scholar 

  • Suwa et al., 2004. GPCR and G-protein Coupling Selectivity Prediction Based on SVM with Physico-Chemical Parameters. GIW 2004 Poster Abstract: P056. http://www.jsbi.org/ journaIlGIW04/GIW04Poster.html

    Google Scholar 

  • Takahashi et al., 2005. Identification of Dopamine Dl Receptor Agonists and Antagonists under Existing Noise Compounds by TFS-based ANN and SVM. J. Comput. Chern. Jpn., 4(2): 43–48

    Article  Google Scholar 

  • Takaoka et al., 2003. Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists’ Intuition. J. Chern. Inf. Comput. Sci., 43(4): 1269–1275.

    MathSciNet  Google Scholar 

  • Teramoto et al., 2005. Prediction of siRNA functionality using generalized string kernel and support vector machine. FEBS Lett. 579(13):2878–82

    Article  Google Scholar 

  • Thukral et al., 2005. Prediction of Nephrotoxicant Action and Identification of Candidate Toxicity-Related Biomarkers. Toxicologic Pathology, 33(3): 343–355

    Article  Google Scholar 

  • Tobita et al., 2005. A discriminant model constructed by the support vector machine method for HERG potassium channel inhibitors Bioorganic & Medicinal Chemistry Letters, 15:2886–2890

    Article  Google Scholar 

  • Vinayagam et al., 2004. Appplying support vector machines for gene ontology based gene function prediction. BMC Bioinformatics. 5:116–129

    Article  Google Scholar 

  • Tsai and Wang, 2005. Evolutionary optimization with data collocation for reverse engineering of biological networks. Bioinformatics, 21(7): 1180–1188.

    Article  Google Scholar 

  • Vapnik, V. N. The Nature of Statistical Learning Theory; Springer: New York, 1995.

    MATH  Google Scholar 

  • Wachowiak et al., 2004. An approach to multimodal biomedical image registration utilizing particle swarm optimization. IEEE Trans on EC, 8(3):289–301.

    MathSciNet  Google Scholar 

  • Wang et al., 2004. Particle swarm optimization and neural network application for QSAR. In HiCOMB.

    Google Scholar 

  • Wang et al., 2005. Gene selection from micro array data for cancer classification — a machine learning approach. Computational Biology and Chemistry, 29(1): 37–46

    Article  MATH  Google Scholar 

  • Warmuth et al., 2003. Active Learning with Support Vector Machines in the Drug Discovery Process. J. Chern. Inf. Comput. Sci., 43(2): 667–673

    Google Scholar 

  • Watkins and German, 2002. Metabolomics and biochemical profiling in drug discovery and development. Curro Opin. Mol. Ther., 4(3): 224–8

    Google Scholar 

  • Xiao et al., 2003. Gene clustering using self-organizing maps and particle swarm optimization. In HiCOMB

    Google Scholar 

  • Xu and Hagler 2002. Chemoinformatics and drug discovery. Molecules, 7: 566–600

    Article  Google Scholar 

  • Xue et al., 2004a. Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach. J. Chern. Inf. Comput. Sci. 44(4): 1497–1505

    Google Scholar 

  • Xue, et al., 2004b. QSAR Models for the Prediction of Binding Affinities to Human Serum Albumin Using the Heuristic Method and a Support Vector Machine. J. Chern. Inf. Comput. Sci., 44(5): 1693–1700

    Google Scholar 

  • Yang and Chou, 2004. Bio-support vector machines for computational proteomics. Bioinformatics, 20: 735–741.

    Article  Google Scholar 

  • Yap and Chen, 2005. Prediction of Cytochrome P450 3A4, 2D6, and 2C9 Inhibitors and Substrates by Using Support Vector Machines. J. Chern. Inf. Model, To appear.

    Google Scholar 

  • Yap et al., 2004. Prediction of Torsade-Causing Potential of Drugs by Support Vector Machine Approach. Toxicol. Sci., 79: 170–177

    Article  Google Scholar 

  • Yoon et al., 2003. Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines. Clinical Chemistry and Laboratory Medicine, 41(4): 529–534.

    Article  Google Scholar 

  • Zhao et al., 2004. Diagnosing anorexia based on partial least squares, back-propagation neural network, and support vector machines. J. Chern. Inf. Sci. 44, 2040–2046.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Barrett, S.J., Langdon, W.B. (2006). Advances in the Application of Machine Learning Techniques in Drug Discovery, Design and Development. In: Tiwari, A., Roy, R., Knowles, J., Avineri, E., Dahal, K. (eds) Applications of Soft Computing. Advances in Intelligent and Soft Computing, vol 36. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36266-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-36266-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29123-7

  • Online ISBN: 978-3-540-36266-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics