Abstract
Similarity searches in the databases of chemical fingerprints are a fundamental task in discovering novel drug-like molecules. Multibit trees have a data structure that enables fast similarity searches of chemical fingerprints (Kristensen et al., WABI’09). A standard pointer-based representation of multibit trees consumes a large amount of memory to index large-scale fingerprint databases. To make matters worse, original fingerprint databases need to be stored in memory to filter out false positives. A succinct data structure is compact and enables fast operations. Many succinct data structures have been proposed thus far, and have been applied to many fields such as full text indexing and genome mapping. We present compact representations of both multibit trees and fingerprint databases by applying these data structures. Experiments revealed that memory usage in our representations was much smaller than that of the standard pointer-based representation. Moreover, our representations enabled us to efficiently perform PubChem-scale similarity searches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aung, Z., Ng, S.-K.: An Indexing Scheme for Fast and Accurate Chemical Fingerprint Database Searching. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 288–305. Springer, Heidelberg (2010)
Baldi, P., Hirschberg, D.: An Intersection Inequality Sharper than the Tanimoto Triangle Inequality for Efficiently Searching Large Databases. Journal of Chemical Information and Modeling 49, 1866–1870 (2009)
Baldi, P., Hirschberg, D., Nasr, R.: Speeding Up Chemical Database Searches Using a Proximity Filter Based on the Logical Exclusive-OR. Journal of Chemical Information and Modeling 48, 1367–1378 (2008)
Chazelle, B.: A Functional Approach to Data Structures and its Use in Multidimensional Searching. SIAM Journal on Computing 17 (1988)
Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. Journal of the ACM 21, 246–260 (1974)
Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 269–278. Society for Industrial and Applied Mathematics (2001)
Jacobson, G.: Space-efficient Static Trees and Graphs. In: Proceedings of the 30th Annual Symposium of Foundations of Computer Science, pp. 549–554 (1989)
Keiser, M., Roth, B., Armbruster, B., Ernsberger, P., Irwin, J., Shoichet, B.: Relating protein pharmacology by ligand chemistry. Nature Biotechnology 25(2), 197–206 (2007)
Leach, A., Gillet, V.: An introduction to chemoinformatics. Kluwer Academic Publishers, The Netherlands, rev. ed. (2007)
Nasr, R., Hirschberg, D., Baldi, P.: Hashing Algorithms and Data Structures for Rapid Searches of Fingerprint Vectors. Journal of Chemical Information and Modeling 50, 1358–1368 (2010)
Nasr, R., Kristensen, T., Baldi, P.: Tree and hashing data structures to speed up chemical searches: Analysis and experiments. Molecular Informatics 30, 791–800 (2011)
Navarro, G., Providel, E.: Fast, Small, Simple Rank/Select on Bitmaps. In: Proc. SEA, pp. 295–306 (2012)
Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: Workshop on Algorithm Engineering & Experiments (2007)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 232–242 (2002)
Swamidass, S., Baldi, P.: Bounds and Algorithms for Exact Searches of Chemical Fingerprints in Linear and Sublinear time. Journal of Chemical Information and Modeling 47, 302–317 (2007)
Tarjan, R.E., Yao, A.C.: Storing a Sparse Table. Communications of the ACM 22, 606–611 (1979)
Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A Tree Based Method for the Rapid Screening of Chemical Fingerprints. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 194–205. Springer, Heidelberg (2009)
Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms for Molecular Biology 5 (2010)
Turan, G.: Succinct Representation of Graphs. Discrete Applied Math. 8, 289–294 (1984)
Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42, 193–201 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tabei, Y. (2012). Succinct Multibit Tree: Compact Representation of Multibit Trees by Using Succinct Data Structures in Chemical Fingerprint Searches. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-33122-0_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33121-3
Online ISBN: 978-3-642-33122-0
eBook Packages: Computer ScienceComputer Science (R0)