Abstract
Protein targets specificity classification is an important step in computational drug development and design efforts. The enhanced classification models of small chemical molecules enable the rapid scanning of large compounds databases. Here, we present the k-nearest neighbors with genetic algorithm feature optimization approach for selection of small molecule protein inhibitors. The method is trained on selected, diverse activity classes of the MDL drug data report (MDDR) with ligands described using simple atom pairs two dimensional chemical descriptors. The accuracy of inhibitors identification is presented in confusion tables with calculated recall and precision values. The precision for selected types of targets exceeded 70%, and the recall reaches 40%. As a consequence, the method can be easily applied to large commercial compounds collections in a drug development campaign in order to significantly reduce the number of ligands for further costly experimental validation.
Similar content being viewed by others
References
Hert J, Keiser MJ, Irwin JJ, Oprea TI, Shoichet BK (2008) J Chem Inf Model 48(4):755–765
Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Nat Biotechnol 25(2):197–206
Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, Hamon J, Urban L, Whitebread S, Jenkins JL (2007) ChemMedChem 2(6):861–873
Ji ZL, Wang Y, Yu L, Han LY, Zheng CJ, Chen YZ (2006) Toxicol Lett 164(2):104–112
Plewczynski D, von Grotthuss M, Spieser SA, Rychlewski L, Wyrwicz LS, Ginalski K, Koch U (2007) Comb Chem High Throughput Screen 10(3):189–196
Fang J, Dong Y, Lushington GH, Ye QZ, Georg GI (2006) J Biomol Screen 11(2):138–144
Briem H, Gunther J (2005) Chembiochem 6(3):558–566
Sheridan RP, Nachbar RB, Bush BL (1994) J Comput Aided Mol Des 8(3):323–340
Wilton D, Willett P, Lawson K, Mullier G (2003) J Chem Inf Comput Sci 43(2):469–474
MDL, MDL Drug Data Report (2004) Coverage: 1988-present; updated monthly. Focus: Drugs launched or under development, as referenced in the patent literature, conference proceedings, and other sources; descriptions of therapeutic action and biological activity; tracking of compounds through development phases. Size: 132726 molecules,129459 models. Updates add approximately 10,000 new compounds per year. 2004
Plewczynski D, Spieser SA, Koch U (2006) J Chem Inf Model 46(3):1098–1106
Bender A, Glen RC (2005) J Chem Inf Model 45(5):1369–1375
Nidhi, Glick M, Davies JW, Jenkins JL (2006) J Chem Inf Model 46(3):1124–1133
Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, Glick M (2006) J Med Chem 49(23):6802–6810
Sheridan RP (2000) J Chem Inf Comput Sci 40(6):1456–1469
Miller MD, Sheridan RP, Kearsley SK (1999) J Med Chem 42(9):1505–1514
Kauffman GW, Jurs PC (2001) J Chem Inf Comput Sci 41(6):1553–1560
Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA (1997) J Mol Biol 265(4):445–464
Itskowitz P, Tropsha A (2005) J Chem Inf Model 45(3):777–785
Zheng W, Tropsha A (2000) J Chem Inf Comput Sci 40(1):185–194
Burbidge R, Trotter M, Buxton B, Holden S (2001) Comput Chem 26(1):5–14
Byvatov E, Fechner U, Sadowski J, Schneider G (2003) J Chem Inf Comput Sci 43(6):1882–1889
Acknowledgements
This work was supported by EC within BioSapiens (LHSG-CT-2003–503265) and SEPSDA (SP22-CT-2004–003831) 6FP projects and the Polish Ministry of Education and Science (PBZ-MNiI-2/1/2005 and MNII ordinary research grant to DP).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Plewczynski, D. kNNsim: k-Nearest neighbors similarity with genetic algorithm features optimization enhances the prediction of activity classes for small molecules. J Mol Model 15, 591–596 (2009). https://doi.org/10.1007/s00894-008-0349-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-008-0349-1