Partitioning in Binary-Transformed Chemical Descriptor Spaces

  • Jeffrey W. Godden
  • Jürgen Bajorath
Part of the Methods in Molecular Biology™ book series (MIMB, volume 275)


Here we describe a statistically based partitioning method called median partitioning (MP), which involves the transformation of value distributions of molecular property descriptors into a binary classification scheme. The MP approach fundamentally differs from other partitioning approaches that involve dimension reduction of chemical spaces such as cell-based partitioning, since MP directly operates in original, albeit simplified, chemical space. Modified versions of the MP algorithm have been implemented and successfully applied in diversity selection, compound classification, and virtual screening. These findings have demonstrated that dimension reduction techniques, although elegant in their design, are not necessarily required for effective partitioning of molecular datasets. An attractive feature of statistical partitioning approaches such as decision tree methods or MP is their computational efficiency, which is becoming an important criterion for the analysis of compound databases containing millions of molecules.

Key Words

Biological activity chemical descriptors chemical spaces classification methods compound databases decision trees diversity selection partitioning algorithms space transformation statistics statistical medians 


  1. 1.
    Pearlman, R. S. and Smith, K. M. (1998) Novel software tools for chemical diversity. Perspect. Drug Discov. Design 9, 339–353.CrossRefGoogle Scholar
  2. 2.
    Mason, J. S. and Pickett, S. D. (1997) Partition-based selection. Perspect. Drug Discov. Design 7/8, 85–114.Google Scholar
  3. 3.
    Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Drug Discov. Rev. 1, 337–346.CrossRefGoogle Scholar
  4. 4.
    Stahura, F. L. and Bajorath, J. (2003) Partitioning methods for the identification of active molecules. Curr. Med. Chem. 10, 707–715.PubMedCrossRefGoogle Scholar
  5. 5.
    Friedman, J. A. (1977) Recursive partitioning decision rules for non-arametric classification. IEEE Trans. Comput. 26, 404–408.CrossRefGoogle Scholar
  6. 6.
    Chen, X., Rusinko, A. III, and Young, S. S. (1998) Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J. Chem. Inf. Comput. Sci. 38, 1054–1062.Google Scholar
  7. 7.
    Rusinko, A. III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999) Analysis of a large structure/biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026.PubMedGoogle Scholar
  8. 8.
    Agrafiotis, D. K., Lobanov, V. S., and Salemme, R. F. (2002) Combinatorial informatics in the post-genomics era. Nature Drug Discov. Rev. 1, 337–346.CrossRefGoogle Scholar
  9. 9.
    Ward, J. H. (1963) Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244.CrossRefGoogle Scholar
  10. 10.
    Snarey, M., Terrett, N. K., Willett, P., and Wilton, D. J. (1997) Comparison of algorithms for dissimilarity-based compound selection. J. Mol. Graph. Model. 15, 372–285.PubMedCrossRefGoogle Scholar
  11. 11.
    Higgs, R. E., Bemis, K. G., Watson, I. A., and Wikel, J. H. (1997) Experimental designs for selecting molecules from large chemical databases. J. Chem. Inf. Comput. Sci. 37, 861–870.Google Scholar
  12. 12.
    Willett, P. (1999) Dissimilarity-based algorithms for selecting structurally diverse sets of compounds. J. Comput. Biol. 6, 447–457.PubMedCrossRefGoogle Scholar
  13. 13.
    Godden J. W., Xue, L., Kitchen, D. B., Stahura, F. L., Schermerhorn, E. J., and Bajorath, J. (2002) Median partitioning: A novel method for the selection of representative subsets from large compound pools. J. Chem. Inf. Comput. Sci. 42, 885–893.PubMedGoogle Scholar
  14. 14.
    Godden, J. W., Xue, L., and Bajorath, J. (2002) Classification of biologically active compounds by median partitioning. J. Chem. Inf. Comput. Sci. 42, 1263–1269.PubMedGoogle Scholar
  15. 15.
    Godden, J. W., Furr, J. R., and Bajorath, J. (2003) Recursive median partitioning for virtual screening of large databases. J. Chem. Inf. Comput. Sci. 43, 182–188.PubMedGoogle Scholar
  16. 16.
    Livingstone, D. J. (2000) The characterization of chemical structures using molecular properties. A survey. J. Chem. Inf. Comput. Sci. 40, 195–209.PubMedGoogle Scholar
  17. 17.
    Xue, L. and Bajorath, J. (2000) Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening. Combin. Chem. High Throughput Screen. 3, 363–372.Google Scholar
  18. 18.
    Meier, P. C. and Zünd, R. E. (2000) Statistical methods in analytical chemistry. Wiley, New York, NY.CrossRefGoogle Scholar
  19. 19.
    Godden, J. W. and Bajorath, J. (2002) Chemical descriptors with distinct levels of information content and varying sensitivity to differences between selected compound databases identified by SE-DSE analysis. J. Chem. Inf. Comput. Sci. 42, 87–93.PubMedGoogle Scholar
  20. 20.
    Shannon, C. E. and Weaver, W. (1963) The mathematical theory of communication. University of Illinois Press, Urbana, IL.Google Scholar
  21. 21.
    Forrest, S. (1993) Genetic algorithms-principles of natural selection applied to computation. Science 261, 872–878.PubMedCrossRefGoogle Scholar
  22. 22.
    Agrafiotis, D. K. (2001) A constant time algorithm for estimating the diversity of large chemical libraries. J. Chem. Inf. Comput. Sci. 41, 159–167.PubMedGoogle Scholar
  23. 23.
    Xue, L. and Bajorath, J. (2002) Accurate partitioning of compounds belonging to diverse activity classes. J. Chem. Inf. Comput. Sci. 42, 757–764.PubMedGoogle Scholar

Copyright information

© Humana Press Inc. 2004

Authors and Affiliations

  • Jeffrey W. Godden
    • 1
  • Jürgen Bajorath
    • 1
    • 2
  1. 1.Computer Aided Drug Discovery, Albany Molecular Research Inc.Bothell Research CenterBothellWAUSA
  2. 2.Department of Biological StructureUniversity of WashingtonSeattleUSA

Personalised recommendations