An Integrated Approach to 2-D and 3-D Similarity Searching for the Cambridge Structural Database (CSD)
Similarity searching in chemical databases depends crucially upon the chosen molecular attribute sets. The current 2-D implementation in the Cambridge Structural Database System uses substructural bit screens as attributes. These contain chemical information at a restricted connectivity level around each atom or bond; the only larger pattern units represented are rings and ring systems. Gross pattern attributes can, however, be assigned in terms of inter-nodal bond separation frequencies established using a shortest path algorithm. This information can be used alone (or in combination with the chemical attributes) to provide an alternative (or enhanced) approach to the 2-D problem. In 3-D, similarity concepts have meaning at both the substructural and full structural levels. A specific chemical substructure may exist in a variety of 3-D conformations. A modified Minkowski metric based on torsion angle descriptors is used to compare 3-D shapes. This results in a 1-D ‘conformational spectrum’ graphical representation in which different conformers often appear in well separated groups for rapid identification. At the full molecular level, comparison of complete distance matrices provides the most complete solution. However, due to the vast computational effort this requires, the distance matrix may be reduced to a distance-frequency distribution. Ultimately it is planned that the 2-D (inter-nodal bond separations) and 3-D (Å distances) approaches will be combined to provide suitable descriptors for similarity work.
KeywordsSimilarity Search Cambridge Structural Database Query Structure City Block Pattern Attribute
Unable to display preview. Download preview PDF.
- 2.Bawden D. ‘Browsing and Clustering of Chemical Structures’. In Chemical Structures. The International Language of Chemistry; Warr W.A., Ed.; Springer-Verlag: Berlin, 1988; pp. 145–150.Google Scholar
- 4.CSD System User’s Manual Part I: QUEST89; Cambridge Crystallographic Data Centre: Cambridge, England, 1989.Google Scholar
- 5.Distance Geometry and Conformational Calculations; Crippen, G.M.; Research Studies Press: Letchworth, 1977.Google Scholar
- 9.Willett P. ‘Similarity Coefficients and Weighting Functions for Automatic Document Classification: an Empirical Comparison’. Int. Classif. 1983, 10, 138–142.Google Scholar
- 10.Similarity and Clustering in Chemical Information Systems; Willett, P.; Research Studies Press: Letchworth, 1987.Google Scholar
- 11.Cluster Analysis, Everitt B.; Halstead-Heinemann: London, 1980.Google Scholar