Molecular Diversity

, Volume 6, Issue 2, pp 135–147 | Cite as

Use of alignment-free molecular descriptors in diversity analysis and optimal sampling of molecular libraries

  • Fabien Fontaine
  • Manuel Pastor
  • Hugo Gutiérrez-de-Terán
  • Juan J. Lozano
  • Ferran SanzEmail author


The selection of a sample of diverse compounds is a common strategy for exploring large molecular libraries. However, the success of such approach depends on the selection of relevant molecular descriptors and the use of appropriate sampling methods. In the context of pharmaceutical research, the molecular descriptors should be based on physicochemical properties related with the pharmacological behaviour of the compounds. In this sense, the alignment-free GRIND and VolSurf molecular descriptors are promising candidates since they have been successfully used in the modelling of both pharmacodynamic and pharmacokinetic properties of drugs. This work describes the use of such descriptors in the diversity sampling of a library of primary amines and compares the results with those obtained in a previous study that used quantum-mechanical descriptors. As in the previous work, principal component (PC) analysis was applied to reduce the dimensionality and remove redundant information of the original descriptors, and the compounds were sampled on the basis of k-means clustering on the space of the selected PCs. The results of the present study show that VolSurf and GRIND provide similar quality sampling regarding global features of the molecules such as hydrophilicity, however the topology of the compounds is considered differently. The similarity between particular compounds strongly depends on the original descriptors used. However all the sample selections done in the PC space after k-means clustering provide the same apparent diversity in comparison to the whole dataset. The results indicate that there is no best set of descriptors on a diversity basis. The selection of descriptors must be based on the drug features to be investigated.

Almond diversity GRIND k-means clustering molecular descriptors molecular library sampling principal component analysis quantum-mechanical descriptors VolSurf 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Potter, T. and Matter, H., Random or rational design? Evaluation of diverse compound subsets from chemical structure databases, J. Med. Chem., 41 (1998) 478–488.CrossRefGoogle Scholar
  2. 2.
    Zheng, W., Cho, S. J., Waller, C. L. and Tropsha, A., Rational combinatorial library design. 3. Simulated annealing guided evaluation (SAGE) of molecular diversity: A novel computational tool for universal library design and database mining, J. Chem. Inf. Comput. Sci., 39 (1999) 738–746.CrossRefGoogle Scholar
  3. 3.
    Blaney, J. M. and Martin, E. J., Computational approaches for combinatorial library design and molecular diversity analysis, Curr. Opin. Chem. Biol., 1 (1997) 54–59.CrossRefGoogle Scholar
  4. 4.
    Bures, M. G. and Martin, Y. C., Computational methods in molecular diversity and combinatorial chemistry, Curr. Opin. Chem. Biol., 2 (1998) 376–380.CrossRefGoogle Scholar
  5. 5.
    Gorse, D. and Lahana, R., Functional diversity of compound libraries, Curr. Opin. Chem. Biol., 4 (2000) 287–294.CrossRefGoogle Scholar
  6. 6.
    Willett, P., Chemoinformatics-Similarity and diversity in chemical libraries, Curr. Opin. Biotechnol., 11 (2000) 85–88.CrossRefGoogle Scholar
  7. 7.
    Tropsha, A. and Zheng, W., Rational principles of compound selection for combinatorial library design, Comb.Chem.High Throughput Screen., 5 (2002) 111–123.Google Scholar
  8. 8.
    Beavers, M. P. and Chen, X., Structure-based combinatorial library design: Methodologies and applications, J. Mol. Graph. Model., 20 (2002) 463–468.CrossRefGoogle Scholar
  9. 9.
    Martin, Y. C., Diverse viewpoints on computational aspects of molecular diversity, J. Comb. Chem., 3 (2001) 231–250.CrossRefGoogle Scholar
  10. 10.
    Gutiérrez-de-Terán, H., Lozano, J. J., Segarra, V. and Sanz, F., Molecular diversity sample generation on the basis of quantum-mechanical computations and principal component analysis, Comb. Chem. High Throughput Screen., 5 (2002) 49–57.Google Scholar
  11. 11.
    Gillet, V. J., Background theory of molecular diversity, In Dean, P. M. and Lewis, R. A. (eds.), Molecular Diversity in Drug Design, Kluwer Academic Publishers, Dordrecht, 1999, pp. 43–65.Google Scholar
  12. 12.
    Xue, L. and Bajorath, J., Molecular descriptors in chemoinformatics, computational combinatorial chemistry, and virtual screening, Comb. Chem. High Throughput Screen., 3 (2000) 363–372.Google Scholar
  13. 13.
    Mason, J. S. and Beno, B. R., Library design using BCUT chemistry-space descriptors and multiple four-point pharmacophore fingerprints: Simultaneous optimization and structure-based diversity, J. Mol. Graph. Model., 18 (2000) 438–451, 538.CrossRefGoogle Scholar
  14. 14.
    Dixon, S. L. and Villar, H. O., Bioactive diversity and screening library selection via affinity fingerprinting, J. Chem. Inf. Comput. Sci., 38 (1998) 1192–1203.CrossRefGoogle Scholar
  15. 15.
    Lipkus, A. H., Exploring chemical rings in a simple topological-descriptor space, J. Chem. Inf. Comput. Sci., 41 (2001) 430–438.CrossRefGoogle Scholar
  16. 16.
    Barnard, J. M., Downs, G. M., Von Scholley-Pfab, A. and Brown, R. D., Use of Markush structure analysis techniques for descriptor generation and clustering of large combinatorial libraries, J. Mol. Graph. Model., 18 (2000) 452–463.Google Scholar
  17. 17.
    Ivanciuc, O. and Klein, D. J., Computing wiener-type indices for virtual combinatorial libraries generated from heteroatomcontaining building blocks, J. Chem. Inf. Comput. Sci., 42 (2002) 8–22.CrossRefGoogle Scholar
  18. 18.
    Rarey, M. and Stahl, M., Similarity searching in large combinatorial chemistry spaces, J. Comput. Aided Mol. Des., 15 (2001) 497–520.CrossRefGoogle Scholar
  19. 19.
    Consonni, V., Todeschini, R. and Pavan, M., Structure/response correlations and similarity/diversity analysis by GETAWAY descriptors. 1. Theory of the novel 3-D molecular descriptors, J. Chem. Inf. Comput. Sci., 42 (2002) 682–692.CrossRefGoogle Scholar
  20. 20.
    Makara, G.M., Measuring molecular similarity and diversity: total pharmacophore diversity, J. Med. Chem., 44 (2001) 3563–3571.CrossRefGoogle Scholar
  21. 21.
    Goodford, P. J., A computational procedure for determining energetically favorable binding sites on biologically important macromolecules, J. Med. Chem., 28 (1985) 849–857.CrossRefGoogle Scholar
  22. 22.
    Cramer, R. D., Patterson, D. E. and Bunce, J. D., Comparative Molecular Field Analysis (CoMFA): 1. Effect of shape on binding of steroids to carrier proteins, J. Am. Chem. Soc., 110 (1988) 5959–5967.CrossRefGoogle Scholar
  23. 23.
    Cruciani, G. and Watson, K. A., Comparative molecular field analysis using GRID force-field and GOLPE variable selection methods in a study of inhibitors of glycogen phosphorylase b, J. Med. Chem., 37 (1994) 2589–2601.CrossRefGoogle Scholar
  24. 24.
    Cruciani, G., Pastor, M. and Guba, W., VolSurf: A new tool for the pharmacokinetic optimization of lead compounds, Eur. J. Pharm. Sci., 11 Suppl 2 (2000) S29–39.CrossRefGoogle Scholar
  25. 25.
    Crivori, P., Cruciani, G., Carrupt, P. A. and Testa, B., Predicting blood-brain barrier permeation from three-dimensional molecular structure, J. Med. Chem., 43 (2000) 2204–2216.CrossRefGoogle Scholar
  26. 26.
    Zamora, I., Oprea, T., Cruciani, G., Pastor, M. and Ungell, A. L., Surface descriptors for protein-ligand affinity prediction, J. Med. Chem., 46 (2003) 25–33.CrossRefGoogle Scholar
  27. 27.
    Pastor, M., Cruciani, G., McLay, I., Pickett, S. and Clementi, S., GRid-INdependent descriptors (GRIND): A.a novel class of alignment-independent three-dimensional molecular descriptors, J. Med. Chem., 43 (2000) 3233–3243.CrossRefGoogle Scholar
  28. 28.
    Benedetti, P., Mannhold, R., Cruciani, G. and Pastor, M., GBR compounds and mepyramines as cocaine abuse therapeutics: Chemometric studies on selectivity using grid independent descriptors (GRIND), J. Med. Chem., 45 (2002) 1577–1584.CrossRefGoogle Scholar
  29. 29.
    Afzelius, L., Masimirembwa, C. M., Karlen, A., Andersson, T. B. and Zamora, I., Discriminant and quantitative PLS analysis of competitive CYP2C9 inhibitors versus non-inhibitors using alignment independent GRIND descriptors, J. Comput. Aided Mol. Des., 16 (2002) 443–458.CrossRefGoogle Scholar
  30. 30.
    Cruciani, G., Pastor, M. and Mannhold, R., Suitability of molecular descriptors for database mining. A comparative analysis, J. Med. Chem., 45 (2002) 2685–2694.CrossRefGoogle Scholar
  31. 31.
    Oprea, T. I., Zamora, I. and Ungell, A. L., Pharmacokinetically based mapping device for chemical space navigation, J. Comb. Chem., 4 (2002) 258–266.CrossRefGoogle Scholar
  32. 32.
    Gasteiger, J., Rudolph, C. and Sadowski, J., Automatic generation of 3-D atomic coordinates for organic molecules, Tetrahedron Comp. Method., 3 (1990) 537–547.CrossRefGoogle Scholar
  33. 33.
    Giesen, D. J., Gu, M. Z., Cramer, C. J. and Truhlar, D. G., A Universal Organic Solvation Model, J. Org. Chem., 61 (1996) 8720–8721.CrossRefGoogle Scholar
  34. 34.
    AMSOL 6.5.2, Hawkins, G. D., Giesen, D. J., G. C., L., Chambers, C. C., Rossi, I., Storer, J. W., Rinaldi, D., Liotard, D. A., Cramer, C. J. and Truhlar, D. G., University of Minnesota, Minneapolis, 1997.Google Scholar
  35. 35.
    VolSurf 3.0.7c, Cruciani, G., Pastor, M. and Mecucci, S., Molecular Discovery Ltd., Perugia, 2002.Google Scholar
  36. 36.
    Almond 3.2.0, Cruciani, G., Fontaine, F. and Pastor, M., Molecular Discovery Ltd., Perugia, 2003.Google Scholar
  37. 37.
    Carey, R. N., Wold, S. and Westgard, J. O., Principal component analysis: an alternative to 'referee' methods in method comparison studies, Anal. Chem., 47 (1975) 1824–1829.CrossRefGoogle Scholar
  38. 38.
    SPSS 11.0.1, SPSS inc. Chicago, 2001.Google Scholar
  39. 39.
    Downs, G. M. and Barnard, J. M., Clustering methods and their uses in computational chemistry, In Lipkowitz, K. B. and Boyd, D. B. (eds.), Reviews in Computational Chemistry, Wiley-VCH, John Wiley & Sons, Inc., 2002, pp. 1–40.Google Scholar

Copyright information

© Kluwer Academic Publishers 2003

Authors and Affiliations

  • Fabien Fontaine
    • 1
  • Manuel Pastor
    • 1
  • Hugo Gutiérrez-de-Terán
    • 1
  • Juan J. Lozano
    • 1
  • Ferran Sanz
    • 2
    Email author
  1. 1.Research Group on Biomedical Informatics (GRIB), IMIMUniversitat Pompeu FabraBarcelonaSpain
  2. 2.Research Group on Biomedical Informatics (GRIB), IMIMUniversitat Pompeu FabraBarcelonaSpain

Personalised recommendations