An Adaptive Reference Point Approach to Efficiently Search Large Chemical Databases
The ability to rapidly search large repositories of molecules is a crucial task in chemoinformatics. In this work we propose AOR, an approach based on adaptive reference points to improve state of the art performances in querying large repositories of binary fingerprints basing on the Tanimoto distance. We propose a unifying view between the context of reference points and the previously proposed hashing techniques. We also provide a mathematical model to forecast and generalize the results, that is validated by simulating queries over an excerpt of the ChemDB. Clustering techniques are finally introduced to improve the performances. For typical situations the proposed algorithm is shown to resolve queries up to 4 times faster than compared methods.
Keywordsmolecular fingerprits chemical database binary vector search
Unable to display preview. Download preview PDF.
- 3.Wang, Y., Xiao, J., Suzek, T., Zhang, J., Wang, J., Bryant, S.: PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Research 37, W623–W633 (2009)Google Scholar
- 4.Sayers, E., Barrett, T., Benson, D., Bolton, E., Bryant, S., Canese, K., Chetvernin, V., Church, D., DiCuccio, M., Federhen, S., et al.: Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 38, D5–D16 (2010)Google Scholar
- 6.Paris, R.B.: Incomplete beta functions. In: Olver, F.W.J., Lozier, D.M., Boisvert, R.F., et al. (eds.) NIST Handbook of Mathematical Functions. Cambridge University Press (2010) ISBN 978-0521192255Google Scholar