Mezey, P.G. J Math Chem (2009) 45: 1. doi:10.1007/s10910-008-9365-8
Molecular databases obtained either by combinatorial chemistry tools or by more traditional methods are usually organized according to a set of molecular properties. A database may be regarded as a multidimensional collection of points within a space spanned by the various molecular properties of interest, the property space. Some properties are likely to be more important than others, those considered important form the essential dimensions of the molecular database. How many properties are essential, this depends on the molecular problem addressed, however, the search in property space is usually limited to a few dimensions. Two types of search strategies are related either to search by property or search by lead compound. The first case corresponds to a lattice model, where the search is based on sets of adjacent blocks, usually hypercubes in property space, whereas lead-based searches in databases can be regarded as search around a center in property space. A natural model for lead-based searches involves a hyperspherical model. In this contribution a theoretical optimum dimension is determined that enhances the effectiveness of lead-based searches in property space of molecular databases.
Molecular databasesLead-based sampling in QSAR Database dimensionSampling errors in high dimensionsQshAR (Quantitative Shape-Activity Relations)