Abstract
Many different measures of structural similarity have been suggested for matching chemical structures, each such measure focusing upon some particular type of molecular characteristic. The multi-faceted nature of biological activity suggests that an appropriate similarity measure should encompass many different types of characteristic, and this article discusses the use of data fusion methods to combine the results of searches based on multiple similarity measures. Experiments with several different types of dataset and activity suggest that data fusion provides a simple, but effective, approach to the combination of individual similarity measures. The best results were generally obtained with a fusion rule that sums the rank positions achieved by each molecule in searches using individual measures.
Similar content being viewed by others
References
Downs, G.M. and Willett, P., Rev. Comput. Chem., 7 (1995) 1.
Dean, P.M. and Perkins, T.D.J., In Martin. Y.C. and Willett, P. (Eds.) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications, American Chemical Society, Washington DC 1998, pp. 199–218.
Special issue devoted to molecular similarity, J. Chem. Inf. Comput. Sci., 32 (1992) 577–752.
Dean, P.M. (Ed.) Molecular Similarity in Drug Design, Chapman and Hall, Glasgow, 1975.
Willett, P., Barnard, J.M. and Downs, G.M., J. Chem. Inf. Comput. Sci., 38 (1998) 983.
Willett, P. and Winterman, V., Quant. Struct.-Act. Relat., 5 (1986) 18.
Hall, D.L., Mathematical Techniques in Multisensor Data Fusion, Artech House, Northwood, MA, 1992.
Kokar, M. and Kim, K., Control Eng. Pract., 2 (1994) 803.
Arabnia, H.R. and Zhu, D. (Eds.) Proceedings of the International Conference on Multisource-Multisensor Information Fusion, Fusion'98, CSREA Press, 1998.
Belkin, N.J., Kantor, P., Fox, E.A. and Shaw, J.B., Inf. Proc. Manag., 31 (1995) 431.
Savoy, J., Ndarugendamwo, M. and Vrajitoru, D., Proceedings of the Fourth Text Retrieval Conference, National Institute for Standards and Technology NIST Special Publication 500-236, Gaithersberg, MD, 1996, pp. 537–547.
Lee, J.H., Proceedings of the Twentieth Annual International Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, 1997, pp. 267–276.
Pfeifer, U., Poersch, T. and Fuhr, N., Inf. Proc. Manag., 32 (1996) 667.
Smeaton, A.F. and Crimmins, F., URL: http://www.inf.udec.cl/_campos/fusion/fusionpc/fusion-www6.html
Clerc, T. and Erni, F., Topics Curr. Chem., 39 (1973) 91.
Masui, H. and Yoshida, M., J. Chem. Inf. Comput. Sci., 36 (1996) 294.
Kearsley, S.K., Sallamack, S., Fluder, E.M., Andose, J.D., Mosely, R.T. and Sheridan, R.P., J. Chem. Inf. Comput. Sci., 36 (1996) 118.
Sheridan, R.P., Miller, M.D., Underwood, D.J. and Kearsley, S.K., J. Chem. Inf. Comput. Sci., 36 (1996) 128.
So, S.-S. and Karplus, M., J. Comput.-Aided Mol. Design, 13 (1999) 243.
Ginn, C.M.R., Turner, D.B., Willett, P., Ferguson, A.M. and Heritage, T.W., J. Chem. Inf. Comput. Sci., 37 (1997) 23.
The Starlist file is available from BioByte Corp. at http://clogp.pomona.edu/
UNITY is available from Tripos Inc. at http://www.tripos.com
Ginn, C.M.R., The Application of Data Fusion to Similarity Searching of Chemical Databases. Ph.D. thesis, University of Sheffield, 1998.
Ranade, S.S., Prediction of Cellular Uptake of Foreign Chemicals Using Cluster Analysis, Ph.D. thesis, University of Sheffield, 1998.
Barnard Chemical Information Limited is at URL http://www.bci1.demon.co.uk
Bath, P.A., Poirrette, A.R., Willett, P. and Allen, F.H., J. Chem. Inf. Comput. Sci., 34 (1994) 141.
Siegel, S. and Castellan, N.J., Nonparametric Statistics. McGraw-Hill, New York, NY, 1988.
The World Drug Index database is available from Derwent Information at URL http://www.derwent.co.uk
Pepperrell, C.A., Taylor, R. and Willett, P., Tetrahedron Comput. Methodol., 3 (1990) 575.
Drayton, S.K., Edwards, K., Jewell, N.E., Turner, D.B., Wild, D.J., Willett, P., Wright, P.M. and Simmons, K., Internet J. Chem., URL http://www.ijc.com/articles/998v1/37/
Kahn, S.D., Schleyer, P.v.R., Allinger, N.L., Clark, T., Gasteiger, J., Kollman, P.A., Schaefer III, H.F. and Schreiner, P.R. (Eds.), Encyclopedia of Computational Chemistry, Vol. 1, John Wiley, Chichester, 1998, 417–425.
Molecular Simulations Inc. is at URL http://www.msi.com
ChemX products are available from Oxford Molecular Limited at URL http://www.oxmol.co.uk
Daylight Chemical Information Systems Inc. is at URL http://www.daylight.com
Stanton, D.T. and Jurs, P.C., Anal. Chem., 62 (1990) 2323.
Bradshaw, J., URL: http://www.daylight.com/meetings/mug97/Bradshaw/MUG97/tv_tversky.html
Smeaton, A.F., Proceedings of the Twentieth BCS-IRSG Colloquium, Grenoble, France (in press).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ginn, C.M., Willett, P. & Bradshaw, J. Combination of molecular similarity measures using data fusion. Perspectives in Drug Discovery and Design 20, 1–16 (2000). https://doi.org/10.1023/A:1008752200506
Issue Date:
DOI: https://doi.org/10.1023/A:1008752200506