Abstract
The influence of preprocessing of molecular descriptor vectors for solving classification tasks was analyzed for drug/nondrug classification by artificial neural networks. Molecular properties were used to form descriptor vectors. Two types of neural networks were used, supervised multilayer neural nets trained with the back-propagation algorithm, and unsupervised self-organizing maps (Kohonen maps). Data were preprocessed by logistic scaling and histogram equalization. For both types of neural networks, the preprocessing step significantly improved classification compared to nonstandardized data. Classification accuracy was measured as prediction mean square error and Matthews correlation coefficient in the case of supervised learning, and quantization error in the case of unsupervised learning. The results demonstrate that appropriate data preprocessing is an essential step in solving classification tasks.
Figure Drug/nondrug classification by SOM
Similar content being viewed by others
Abbreviations
- BP:
-
Back-propagation algorithm
- GDA:
-
Gradient descent with adaptive learning rate
- GDM:
-
Gradient descent with momentum
- HTS:
-
High-throughput screening
- LM:
-
Levenberg–Marquardt
- mcc :
-
Matthews correlation coefficient
- MFFN:
-
Multilayer feedforward neural network
- mse :
-
Mean square error
- QE :
-
Quantization error
- QSAR:
-
Quantitative structure–activity relationship
- RP:
-
Resilient back-propagation algorithm
- SOM:
-
Self-organizing map
- SVM:
-
Support vector machine
- TE :
-
Topology error
References
Shah AV, Walters WP, Murcko MA (1998) J Med Chem 41:3314–3324
(a) Sadowski J, Kubinyi H (1998) J Med Chem 41:3325–3329; (b) Sadowski J (1998) In: Böhm HJ, Schneider G (eds) Virtual screening for bioactive molecules. Wiley-VCH, Weinheim, pp 117–130
Zuegge J, Fechner U, Roche O, Parrott NJ, Engkvist O, Schneider G (2002) Quant Struct Act Relat 21:249–256
Roche O, Schneider P, Zuegge J, Guba W, Kansy M, Alanine A, Bleicher K, Danel F, Gutknecht EM, Rogers-Evans M, Neidhart W, Stalder H, Dillon M, Sjögren E, Fotouhi N, Gillespie P, Goodnow R, Harris W, Jones P, Taniguchi M, Tsujii S, von der Saal W, Zimmermann G, Schneider G (2002) J Med Chem 45:137–142
Schneider G, Böhm HJ (2002) Drug Discov Today 7:64–70
Zupan J, Gasteiger J (1999) Neural networks in chemistry and drug design. An introduction. Wiley-VCH, Weinheim
Devillers J (ed) (1996) Neural networks in QSAR and drug design (principles of QSAR and drug design). Academic Press, New York
Schneider G, Wrede P (1998) Prog Biophys Mol Biol 70:175–222
Byvatov E, Schneider G (2003) Appl Bioinf (in press)
Anderson JA, Pellionisz A, Rosenfield E (eds) (1990) Neurocomputing 2: directions for research. MIT Press, Cambridge MA
Churchland PS, Sejnowski TJ (1992) The computational brain. MIT Press, Cambridge MA
Rumelhart DE, Hinton GE, Williams RJ (1986) In: Rumelhart DE, McClelland JL (eds) Parallel distributed processing, vol 1. MIT Press, Cambridge MA, pp 318–362
Kohonen T (1995) Self-organizing map, 2nd edn. Springer, Berlin Heidelberg New York, pp 117–119
Ultsch A, Siemon HP (1990) Proceedings of INNC, pp 305–308
Iivarinen J, Kohonen T, Kangas J, Kaski S (1994) Proceedings of Conference on Artificial Intelligence Research in Finland. pp 122–126
Kraaijveld MA, Mao J, Jain AK (1995) IEEE Trans Neural Networks 6:548–559
Matthews BW (1975) Biochim Biophys Acta 405:442–451
Kiviluoto K (1996) Proceedings of ICNN. pp 294–299
Givehchi A, Dietrich A, Wrede P, Schneider G (2003) QSAR Comb Sci 5:549–559
Chemical Computing Group Inc, 1010 Sherbrooke Street West, Suite 910, Montreal, Quebec, Canada, H3A 2R7; URL:http://www.chemcomp.com/Journal_of_CCG/Features/descr.htm
Mathworks Inc, 3 Apple Hill Drive, Natick, MA 01760–2098, USA; URL:http://www.mathworks.com
Hertz J, Krogh A, Palmer R (1991) Introduction to the theory of neural computation. Addison-Wesley, Redwood City, CA
Demuth H, Beal M (2001) Neural network toolbox, user’s guide version 4. Mathworks Inc, Natick, MA
Hagan MT, Menhaj M (1994) IEEE Trans Neural Networks 5:989–993
Hagan MT, Demuth HB, Beale MH (1996) Neural network design. PWS Publishing, Boston, MA
Riedmiller M, Braun H (1993) Proceedings of the IEEE International Conference on Neural Networks 1:586–591
Widrow B, Lehr MA (1995) Perceptrons, adalines, and back-propagation. In: Arbib MA (ed) The handbook of brain theory and neural networks. MIT Press, Cambridge, MA, pp 719-724
Himberg J (2000) Proceedings of the International Joint Conference on Neural Networks (IJCNN) 3:587–592
Takaoka Y, Endo Y, Yamanobe S, Kakinuma H, Okubo T, Shimazaki Y, Ota T, Sumiya S, Yoshikawa K (2003) J Chem Inf Comput Sci 43:1269–1275
Ajay (2002) Curr Top Med Chem 2:1273–1286
Brüstle M, Beck B, Schindler T, King W, Mitchell T, Clark T (2002) J Med Chem 45:3345-3355
Acknowledgement
Jens Sadowski is thanked for providing us his drug/nondrug data for the purpose of this study. This work was supported by the Beilstein-Institut zur Förderung der Chemischen Wissenschaften, Frankfurt.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Givehchi, A., Schneider, G. Impact of descriptor vector scaling on the classification of drugs and nondrugs with artificial neural networks. J Mol Model 10, 204–211 (2004). https://doi.org/10.1007/s00894-004-0186-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00894-004-0186-9