Combination of molecular similarity measures using data fusion


Many different measures of structural similarity have been suggested for matching chemical structures, each such measure focusing upon some particular type of molecular characteristic. The multi-faceted nature of biological activity suggests that an appropriate similarity measure should encompass many different types of characteristic, and this article discusses the use of data fusion methods to combine the results of searches based on multiple similarity measures. Experiments with several different types of dataset and activity suggest that data fusion provides a simple, but effective, approach to the combination of individual similarity measures. The best results were generally obtained with a fusion rule that sums the rank positions achieved by each molecule in searches using individual measures.

This is a preview of subscription content, access via your institution.


  1. 1.

    Downs, G.M. and Willett, P., Rev. Comput. Chem., 7 (1995) 1.

    Google Scholar 

  2. 2.

    Dean, P.M. and Perkins, T.D.J., In Martin. Y.C. and Willett, P. (Eds.) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications, American Chemical Society, Washington DC 1998, pp. 199–218.

    Google Scholar 

  3. 3.

    Special issue devoted to molecular similarity, J. Chem. Inf. Comput. Sci., 32 (1992) 577–752.

    Google Scholar 

  4. 4.

    Dean, P.M. (Ed.) Molecular Similarity in Drug Design, Chapman and Hall, Glasgow, 1975.

    Google Scholar 

  5. 5.

    Willett, P., Barnard, J.M. and Downs, G.M., J. Chem. Inf. Comput. Sci., 38 (1998) 983.

    CAS  Article  Google Scholar 

  6. 6.

    Willett, P. and Winterman, V., Quant. Struct.-Act. Relat., 5 (1986) 18.

    CAS  Google Scholar 

  7. 7.

    Hall, D.L., Mathematical Techniques in Multisensor Data Fusion, Artech House, Northwood, MA, 1992.

    Google Scholar 

  8. 8.

    Kokar, M. and Kim, K., Control Eng. Pract., 2 (1994) 803.

    Article  Google Scholar 

  9. 9.

    Arabnia, H.R. and Zhu, D. (Eds.) Proceedings of the International Conference on Multisource-Multisensor Information Fusion, Fusion'98, CSREA Press, 1998.

  10. 10.

    Belkin, N.J., Kantor, P., Fox, E.A. and Shaw, J.B., Inf. Proc. Manag., 31 (1995) 431.

    Article  Google Scholar 

  11. 11.

    Savoy, J., Ndarugendamwo, M. and Vrajitoru, D., Proceedings of the Fourth Text Retrieval Conference, National Institute for Standards and Technology NIST Special Publication 500-236, Gaithersberg, MD, 1996, pp. 537–547.

  12. 12.

    Lee, J.H., Proceedings of the Twentieth Annual International Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, 1997, pp. 267–276.

    Google Scholar 

  13. 13.

    Pfeifer, U., Poersch, T. and Fuhr, N., Inf. Proc. Manag., 32 (1996) 667.

    Article  Google Scholar 

  14. 14.

    Smeaton, A.F. and Crimmins, F., URL:

  15. 15.

    Clerc, T. and Erni, F., Topics Curr. Chem., 39 (1973) 91.

    CAS  Google Scholar 

  16. 16.

    Masui, H. and Yoshida, M., J. Chem. Inf. Comput. Sci., 36 (1996) 294.

    CAS  Article  Google Scholar 

  17. 17.

    Kearsley, S.K., Sallamack, S., Fluder, E.M., Andose, J.D., Mosely, R.T. and Sheridan, R.P., J. Chem. Inf. Comput. Sci., 36 (1996) 118.

    CAS  Article  Google Scholar 

  18. 18.

    Sheridan, R.P., Miller, M.D., Underwood, D.J. and Kearsley, S.K., J. Chem. Inf. Comput. Sci., 36 (1996) 128.

    CAS  Article  Google Scholar 

  19. 19.

    So, S.-S. and Karplus, M., J. Comput.-Aided Mol. Design, 13 (1999) 243.

    CAS  Article  Google Scholar 

  20. 20.

    Ginn, C.M.R., Turner, D.B., Willett, P., Ferguson, A.M. and Heritage, T.W., J. Chem. Inf. Comput. Sci., 37 (1997) 23.

    CAS  Article  Google Scholar 

  21. 21.

    The Starlist file is available from BioByte Corp. at

  22. 22.

    UNITY is available from Tripos Inc. at

  23. 23.

    Ginn, C.M.R., The Application of Data Fusion to Similarity Searching of Chemical Databases. Ph.D. thesis, University of Sheffield, 1998.

  24. 24.

    Ranade, S.S., Prediction of Cellular Uptake of Foreign Chemicals Using Cluster Analysis, Ph.D. thesis, University of Sheffield, 1998.

  25. 25.

    Barnard Chemical Information Limited is at URL

  26. 26.

    Bath, P.A., Poirrette, A.R., Willett, P. and Allen, F.H., J. Chem. Inf. Comput. Sci., 34 (1994) 141.

    CAS  Article  Google Scholar 

  27. 27.

    Siegel, S. and Castellan, N.J., Nonparametric Statistics. McGraw-Hill, New York, NY, 1988.

    Google Scholar 

  28. 28.

    The World Drug Index database is available from Derwent Information at URL

  29. 29.

    Pepperrell, C.A., Taylor, R. and Willett, P., Tetrahedron Comput. Methodol., 3 (1990) 575.

    CAS  Article  Google Scholar 

  30. 30.

    Drayton, S.K., Edwards, K., Jewell, N.E., Turner, D.B., Wild, D.J., Willett, P., Wright, P.M. and Simmons, K., Internet J. Chem., URL

  31. 31.

    Kahn, S.D., Schleyer, P.v.R., Allinger, N.L., Clark, T., Gasteiger, J., Kollman, P.A., Schaefer III, H.F. and Schreiner, P.R. (Eds.), Encyclopedia of Computational Chemistry, Vol. 1, John Wiley, Chichester, 1998, 417–425.

    Google Scholar 

  32. 32.

    Molecular Simulations Inc. is at URL

  33. 33.

    ChemX products are available from Oxford Molecular Limited at URL

  34. 34.

    Daylight Chemical Information Systems Inc. is at URL

  35. 35.

    Stanton, D.T. and Jurs, P.C., Anal. Chem., 62 (1990) 2323.

    CAS  Article  Google Scholar 

  36. 36.

    Bradshaw, J., URL:

  37. 37.

    Smeaton, A.F., Proceedings of the Twentieth BCS-IRSG Colloquium, Grenoble, France (in press).

Download references

Author information



Corresponding author

Correspondence to Peter Willett.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Ginn, C.M., Willett, P. & Bradshaw, J. Combination of molecular similarity measures using data fusion. Perspectives in Drug Discovery and Design 20, 1–16 (2000).

Download citation

  • data fusion
  • database searching
  • molecular similarity
  • similarity measure