Developing an in-house system to support combinatorial chemistry


To support the special data handling and design problems that arise in combinatorial chemistry, extensions to the classical chemical information and molecular design systems are required. In this article, we describe the principles and the construction of a proprietary software system to support combinatorial chemistry, which was developed at Ciba-Geigy and is now used at Novartis. The system allows to register combinatorial libraries and their building blocks, as well as associated administrative information, assay results, and computed data. Structure similarity techniques are used to search through and to compare combinatorial libraries. The system can also be used to design libraries manually or by computational selection of structurally diverse sets of building blocks.

This is a preview of subscription content, access via your institution.


  1. 1.

    Felder, E. and Poppinger, D., Adv. Drug Res., 30 (1997) 111.

    CAS  Google Scholar 

  2. 2.

    Martin, E.J., Blaney, J.M., Siani, M.A., Spellmeyer, D.C., Wong, A.K. and Moos, W.H., J. Med. Chem., 38 (1995) 1431.

    CAS  Article  PubMed  Google Scholar 

  3. 3.

    James, C.A., and Weininger, D., Daylight Software Manual, v. 4.42, Daylight Chemical Information Systems Inc., Irvine, CA, U.S.A., 1996; see also

    Google Scholar 

  4. 4.a.

    Weininger, D., J. Chem. Inf. Comput. Sci., 28 (1988) 31.

    CAS  Article  Google Scholar 

  5. 4.b.

    Weininger, D., J. Chem. Inf. Comput. Sci., 28 (1988) 97.

    Article  Google Scholar 

  6. 5.

    In the Daylight-based chemical data mining system we had developed earlier, loading the database into memory took about 45 min for 1.2 million compounds and was usually done once per night.

  7. 6.

    An interesting design problem in this area, which nobody seems to have addressed, is how to plan the initial experiments such that they yield the maximum information about the practical scope of the reaction, given the theoretical scope (the universe of available reagents).

  8. 7.

    Synopsys SPS database:; MDL SPORE database:

  9. 8.

    Furka, A., Drug Dev. Res., 33 (1994) 90.

    CAS  Article  Google Scholar 

  10. 9.

    Delaney, J., presented at the 1997 Daylight MUG meeting; cf.

  11. 10.

    Siani, M.A., Weininger, D., James, C.A. and Blaney, J.M., J. Chem. Inf. Comput. Sci., 34 (1996) 1026.

    Google Scholar 

  12. 11.

    This is because the SMILES representation of a combinatorial library replaces all variable structure parts by a wildcard character, and the TDTs then group together libraries which have essentially no chemical relation.

  13. 12.

    Sigma-Aldrich Library of Rare Chemicals,

  14. 13.

    Maybridge catalogue,

  15. 14.

    The main problem that arises here is the conversion of the similarity matrix into a Cartesian space by multidimensional scaling [27]. In our implementation, this step is limited to sets of 2000 building blocks of each chemical type.

  16. 15.a.

    Carhart, R.E., Smith, D.H. and Ventkatataraghavan, R., J. Chem. Inf. Comput. Sci. 25 (1985) 64.

    CAS  Article  Google Scholar 

  17. 15.b.

    Taylor, R., J. Chem. Inf. Comput. Sci. 35 (1995) 59.

    CAS  Article  Google Scholar 

  18. 15.c.

    Moreau, G., Conference Proceedings ‘Synthetic Chemical Libraries in Drug Discovery’, London, U.K., October 30–31, 1995.

    Google Scholar 

  19. 16.

    Diversity design at the enumerated library level is algorithmically difficult because of two reasons. Firstly, the design space is very large. Secondly, it is not sufficient to generate sets of otherwise unrelated compounds. These sets must reflect the relation between structures which is imposed by the experimental format of high-throughput synthesis (multiparallel or mix-and-split), because, in actual high-throughput screening practice, selecting individual compounds out of larger arrays is expensive and is usually not done.

  20. 17.

    Young, S., Farmen, M. and Rusinko, A., Network Science,, August 1996.

  21. 18.

    Rohde, B., unpublished work, 1996.

  22. 19.

    The 4.51 release of the Daylight software allows efficient and precise searches over enumerated small (<1000 compounds) libraries.

  23. 20.

    The explanation given here is a little simplified. Interested readers should refer to the Daylight theory manual [3].

  24. 21.

    This coding employs superimposition of bit positions, in order to make more efficient use of computer storage, cf.

  25. 22.

    The SMARTS pattern-matching language is described in Weininger, D. et al.,

  26. 23.

    Topological descriptors are from the MolConnX program; cf. Kier, L.B. and Hall, L.H., Molecular Connectivity in Structure-Activity Analysis, Wiley, New York, NY, 1986.

    Google Scholar 

  27. 24.

    Theoretical values for molecular refraction and octanol/water partition coefficient, computed with the Daylight programs cmr and clogp.

  28. 25.

    HOMO and LUMO energies. Quantum-chemical descriptors are computed with MOPAC (Stewart, J.J.P., J. Comput.-Aided Mol. Design, 4 (1990) 1) on single 3D conformations obtained with CONCORD (Pearlman, R.S., Chem. Design Autom. News, 2 (1987) 1). Dipole moments are not included because of their conformational sensitivity.

    Article  Google Scholar 

  29. 26.

    Cf. Willett, P., this volume (pp. 1–11).

  30. 27.

    We follow largely the description published by Martin et al. [2], with the difference that we use a noniterative MDS algorithm.

  31. 28.

    Program objects are described in Weininger, D. et al.,

  32. 29.

    Lajiness, M., In Rouvray, D.H. (Ed.) Computational Chemical Graph Theory, Nova Science Publishers, New York, NY, 1990, pp. 299–316. See also Lajiness, M., this volume (pp. 65–84).

    Google Scholar 

  33. 30.

    In this situation, d-optimal design tends to select points from the periphery of the design space, i.e. outliers, which is usually not desired. We have therefore implemented a generalization of the d-optimal design algorithm (‘locally d-optimal design’), which does not suffer from this defect. It optimizes the d-optimal design scores for each BB and its n nearest neighbors, where n is the dimensionality of the design space. When selecting n+1 points, it produces the same results as d-optimal design. For larger selections, it produces distributions which are intuitively more convincing. However, the algorithm is much slower than d-optimal design.

  34. 31.

    In fact, the Novartis Crop Protection chemical structure registration system, which used to be based on MACCS and ISIS, has recently been replaced by an Oracle application. There is no technical reason why this approach cannot be extended to registering combinatorial libraries.

Download references

Author information



Corresponding author

Correspondence to Dieter Poppinger.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gobbi, A., Poppinger, D. & Rohde, B. Developing an in-house system to support combinatorial chemistry. Perspectives in Drug Discovery and Design 7, 131–158 (1996).

Download citation

Key words

  • combinatorial library diversity
  • combinatorial library fingerprints
  • combinatorial library similarity
  • design of combinatorial libraries
  • registration
  • searching