Using Recursive Partitioning Analysis to Evaluate Compound Selection Methods
The design and analysis of a screening set for high throughput screening is complex. We examine three statistical strategies for compound selection, random, clustering, and space-filling. We examine two types of chemical descriptors, BCUTs and principal components of Dragon Constitutional descriptors. Based on the predictive power of multiple tree recursive partitioning, we reached the following tentative conclusions. Random designs appear to be as good as clustering and space-filling designs. For analysis, BCUTs appear to be better than principal components scores based upon Constitutional Descriptors. We confirm previous results that model-based selection of compounds can lead to improved screening hit rates.
Key WordsDecision trees high throughput screening initial screening sets random recursive partitioning recursive partitioning sequential screening
- 1.Ishikawa, K. (1986) Guide to quality control, Productivity, Inc., Shelton, CT. See also, http://www.hci.com.au/hcisite2/toolkit/causeand/htm.Google Scholar
- 7.Engels, M. F., and Venkatarangan, P. (2001) Smart screening: approaches to efficient HTS. Current Opinion Drug Discovery & Development 4, 275–283.Google Scholar
- 9.Hawkins, D. M. and Kass, G. V. (1982) Automatic interaction detection. In Topics in applied multivariate analysis, Hawkins, D. M. (ed.), Cambridge Univ. Press, pp. 269–302.Google Scholar
- 10.Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. (1984) Classification and regression trees. Wadsworth, New York, NY.Google Scholar
- 11.Quinlan, J. R. (1992) C4.5 programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA.Google Scholar
- 12.Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227.Google Scholar
- 13.Pearlman, R. S. and Smith, K. M. (1999) Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35.Google Scholar
- 14.Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. Wiley, New York, NY.Google Scholar
- 15.Hawkins, D. M. and Musser, B. J. (1999) One tree or a forest? Alternative dendrographic models. Computing Science and Statistics 30, 534–542Google Scholar
- 16.FIRMPlus® http://www.goldenhelix.com.
- 18.Stanton, D. T. (1999) Evaluation and use of BCUT descriptors in QSAR and QSPR studies. Chem. Inf. Comput. Sci. 39, 11–20.Google Scholar
- 19.Lam, R. L. H. (2001) Design and analysis of large chemical databases for drug discovery. Ph.D. Dissertation, University of Waterloo.Google Scholar
- 24.Young, S. S., Farmen, M., and Rusinko, A. III. Random versus rational: Which is better for general compound screening? http://www.netsci.org/Science/Screening/feature09.