Genetic algorithms and self-organizing maps: a powerful combination for modeling complex QSAR and QSPR problems

  • Ersin Bayram
  • Peter SantagoII
  • Rebecca Harris
  • Yun-De Xiao
  • Aaron J. Clauset
  • Jeffrey D. Schmitt


Modeling non-linear descriptor-target activity/property relationships with many dependent descriptors has been a long-standing challenge in the design of biologically active molecules. In an effort to address this problem, we couple the supervised self-organizing map with the genetic algorithm. Although self-organizing maps are non-linear and topology-preserving techniques that hold great potential for modeling and decoding relationships, the large number of descriptors in typical quantitative structure--activity relationship or quantitative structure--property relationship analysis may lead to spurious correlation(s) and/or difficulty in the interpretation of resulting models. To reduce the number of descriptors to a manageable size, we chose the genetic algorithm for descriptor selection because of its flexibility and efficiency in solving complex problems. Feasibility studies were conducted using six different datasets, of moderate-to-large size and moderate-to-great diversity; each with a different biological endpoint. Since favorable training set statistics do not necessarily indicate a highly predictive model, the quality of all models was confirmed by withholding a portion of each dataset for external validation. We also address the variability introduced onto modeling through dataset partitioning and through the stochastic nature of the combined genetic algorithm supervised self-organizing map method using the z-score and other tests. Experiments show that the combined method provides comparable accuracy to the supervised self-organizing map alone, but using significantly fewer descriptors in the models generated. We observed consistently better results than partial least squares models. We conclude that the combination of genetic algorithms with the supervised self-organizing map shows great potential as a quantitative structure--activity/property relationship modeling tool.


genetic algorithm neural networks QSAR QSPR supervised self-organizing maps variable selection 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Barnett, S., Silicon Rally: The race to e-R&D, Pharma 2005, PriceWaterhouseCoopers, 1999.Google Scholar
  2. Hansch, C. 1969Acc. Chem. Res.2232Google Scholar
  3. Draper, N.R., Smith, H. 1998Applied Regression AnalysisWileyNew YorkGoogle Scholar
  4. Lindberg, W., Persson, J.A., Wold, S. 1983Anal. Chem.55643Google Scholar
  5. Geladi, P., Kowalski, B.R. 1986Anal. Chim. Acta1851CrossRefGoogle Scholar
  6. Rogers, D.R., Hopfinger, A.J. 1994J. Chem. Inf. Comput. Sci.34854Google Scholar
  7. Simon, V., Gasteiger, J., Zupan, J. 1993J. Am. Chem. Soc.1159148Google Scholar
  8. Kohonen, T. 2001Self-Organizing Maps3SpringerBerlinGoogle Scholar
  9. Polanski, J. 2000Acta Biochim. Pol.4737Google Scholar
  10. Kovalishyn, V.V., Tetko, I.V., Luik, A.I., Ivakhnenko, A.G. and Livingstone, D.J., Proceedings of the 12th European Symposium on Quantitative Structure--Activity Relationships: Molecular Modeling and Prediction of Bioactivity, August 23--28, 1998pp. 444--445, 2000.Google Scholar
  11. Agrafiotis, D.K., Lobanov, V.S. 2000J. Chem. Inf. Comput. Sci.401356Google Scholar
  12. Espinosa, G., Yaffe, D., Arenas, A., Cohen, Y., Giralt, F. 2001Ind. Eng. Chem. Res.402757Google Scholar
  13. Rose, V.S., Macfie, H.J.H., Croall, I.F. 1991QSAR: Ration. Approaches Des. Bioact. Compd.16213Google Scholar
  14. Anzali, S., Gasteiger, J., Holzgrabe, U., Polanski, J., Sadowski, J., Teckentrup, A., Wagener, M. 1998Pers. Drug Discov. Design,9273Google Scholar
  15. Bernard, P., Golbraikh, A., Kireev, D., Chretien, J.R., Rozhkova, N. 1998Analusis26333Google Scholar
  16. Pintore, M., Taboureau, O., Ros, F., Chretien, J. 2001Eur. J. Med. Chem.36349Google Scholar
  17. Leardi, R., Boggia, R., Terrile, M. 1992J. Chemom.6267Google Scholar
  18. Luke, B.T. 1994J. Chem. Inf. Comput. Sci.341279Google Scholar
  19. Kubinyi, H. 1994Quant. Struct.-Act. Relat.13285Google Scholar
  20. So, S.S., Karplus, M. 1996J. Med. Chem.391521Google Scholar
  21. Li, T., Mei, H., Cong, P. 1991Chemometr. Intell. Lab. Syst.45177Google Scholar
  22. Tang, K., Li, T. 2002Chemometr. Intell. Lab. Syst.6455Google Scholar
  23. Vesanto, J., Himberg, J., Alhoniemi, E. and Parhankangas, J., In Proceedings of the Matlab DSP Conference 1999. pp. 35--40, Espoo, Finland, 1999.Google Scholar
  24. Gao, H. 2001J. Chem. Inf. Comput. Sci.41402Google Scholar
  25. Schmitt, J.D. 2000Curr. Med. Chem.7749Google Scholar
  26. Hammond, P.S., Cheney, J.T., Johnston, D.E., Ehrenkaufer, R.L., Luedtke, R.R., Mach, R.H. 1999Med. Chem. Res.935Google Scholar
  27. Hansch, C., Silipo, C., Steller, E.E. 1975J. Pharm. Sci.641186Google Scholar
  28. Andrea, T.A., Kalayeh, H. 1991J. Med Chem.342824Google Scholar
  29. Yoshida, F., Topliss, J.G. 2000J. Med. Chem.432375Google Scholar
  30. National Cancer Institute Anti-cancer Screen Database, Scholar
  31. Golbraikh, A., Tropsha, A. 2003J. Comput.-Aided Mol. Des.17241Google Scholar
  32. Tropsha, A., Gramatica, P., Gombar, V.K. 2003QSAR Comb. Sci.2269Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2004

Authors and Affiliations

  • Ersin Bayram
    • 1
  • Peter SantagoII
    • 1
  • Rebecca Harris
    • 2
  • Yun-De Xiao
    • 2
  • Aaron J. Clauset
    • 3
  • Jeffrey D. Schmitt
    • 2
  1. 1.Department of Biomedical EngineeringWake Forest UniversityWinston-SalemUSA
  2. 2.Molecular Design GroupTargacept, Inc.Winston-SalemUSA
  3. 3.Computer Science DepartmentUniversity of New MexicoAlbuquerqueUSA

Personalised recommendations