Prediction of Molecular Bioactivity for Drug Design Using a Decision Tree Algorithm
A machine learning-based approach to the prediction of molecular bioactivity in new drugs is proposed. Two important aspects are considered for the task: feature subset selection and cost-sensitive classification. These are to cope with the huge number of features and unbalanced samples in a dataset of drug candidates. We designed a pattern classifier with such capabilities based on information theory and re-sampling techniques. Experimental results demonstrate the feasibility of the proposed approach. In particular, the classification accuracy of our approach was higher than that of the winner of KDD Cup 2001 competition.
KeywordsFeature Selection Information Gain Feature Subset GINI Index Feature Subset Selection
Unable to display preview. Download preview PDF.
- 1.C. Hatzis, David Page(2001). KDD-2001 Cup The Genomics Challenge (2001) Google Scholar
- 2.Gibas, C., Jambeck, P.: Developing Bioinformatics Computer Skills. O’Reilly, Sebastopol (2001)Google Scholar
- 4.Langley, P.: Selection of relevant features in machine learning. In: Proceedings of the AAAI Fall Symposium on Relevance, New Orleans, LA, pp. 1–5. AAAI Press, Menlo Park (1994)Google Scholar
- 5.Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3) (1997)Google Scholar
- 6.Yang, J., Honavar, V.: Feature Subset Selection Using A Genetic Algorithm. In: Proceedings of the GP 1997, Stanford, CA, pp. 380–385 (1997)Google Scholar
- 9.Al-Ani, A., Deriche, M.: Feature selection using a mutual information based measure. In: Proceedings of 16th International Conference on Pattern Recognition, vol. 4, pp. 82–85 (2002)Google Scholar
- 12.Richeldi, M., Lanzi, P.: Performing effective feature selection by investigating the deep structure of the data. In: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pp. 379–383. AAAI Press, Menlo Park (1996)Google Scholar
- 13.Ng, A.Y.: Preventing “over-fitting” of cross-validation data. In: Proceedings of the 14th International Conference on Machine Learning (ICML), Nashvilli, TN, pp. 245–253 (1997)Google Scholar
- 14.Kohavi, R.: A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. In: International Conference on Artificial Intelligence( IJCAI) (1995)Google Scholar