Skip to main content

Succinct Multibit Tree: Compact Representation of Multibit Trees by Using Succinct Data Structures in Chemical Fingerprint Searches

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNBI,volume 7534)

Abstract

Similarity searches in the databases of chemical fingerprints are a fundamental task in discovering novel drug-like molecules. Multibit trees have a data structure that enables fast similarity searches of chemical fingerprints (Kristensen et al., WABI’09). A standard pointer-based representation of multibit trees consumes a large amount of memory to index large-scale fingerprint databases. To make matters worse, original fingerprint databases need to be stored in memory to filter out false positives. A succinct data structure is compact and enables fast operations. Many succinct data structures have been proposed thus far, and have been applied to many fields such as full text indexing and genome mapping. We present compact representations of both multibit trees and fingerprint databases by applying these data structures. Experiments revealed that memory usage in our representations was much smaller than that of the standard pointer-based representation. Moreover, our representations enabled us to efficiently perform PubChem-scale similarity searches.

Keywords

  • Binary Tree
  • Similarity Search
  • Memory Usage
  • Tree Node
  • Compact Representation

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aung, Z., Ng, S.-K.: An Indexing Scheme for Fast and Accurate Chemical Fingerprint Database Searching. In: Gertz, M., Ludäscher, B. (eds.) SSDBM 2010. LNCS, vol. 6187, pp. 288–305. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  2. Baldi, P., Hirschberg, D.: An Intersection Inequality Sharper than the Tanimoto Triangle Inequality for Efficiently Searching Large Databases. Journal of Chemical Information and Modeling 49, 1866–1870 (2009)

    CrossRef  Google Scholar 

  3. Baldi, P., Hirschberg, D., Nasr, R.: Speeding Up Chemical Database Searches Using a Proximity Filter Based on the Logical Exclusive-OR. Journal of Chemical Information and Modeling 48, 1367–1378 (2008)

    CrossRef  Google Scholar 

  4. Chazelle, B.: A Functional Approach to Data Structures and its Use in Multidimensional Searching. SIAM Journal on Computing 17 (1988)

    Google Scholar 

  5. Elias, P.: Efficient Storage and Retrieval by Content and Address of Static Files. Journal of the ACM 21, 246–260 (1974)

    CrossRef  MathSciNet  MATH  Google Scholar 

  6. Ferragina, P., Manzini, G.: An experimental study of an opportunistic index. In: Proceedings of the Twelfth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 269–278. Society for Industrial and Applied Mathematics (2001)

    Google Scholar 

  7. Jacobson, G.: Space-efficient Static Trees and Graphs. In: Proceedings of the 30th Annual Symposium of Foundations of Computer Science, pp. 549–554 (1989)

    Google Scholar 

  8. Keiser, M., Roth, B., Armbruster, B., Ernsberger, P., Irwin, J., Shoichet, B.: Relating protein pharmacology by ligand chemistry. Nature Biotechnology 25(2), 197–206 (2007)

    CrossRef  Google Scholar 

  9. Leach, A., Gillet, V.: An introduction to chemoinformatics. Kluwer Academic Publishers, The Netherlands, rev. ed. (2007)

    Google Scholar 

  10. Nasr, R., Hirschberg, D., Baldi, P.: Hashing Algorithms and Data Structures for Rapid Searches of Fingerprint Vectors. Journal of Chemical Information and Modeling 50, 1358–1368 (2010)

    CrossRef  Google Scholar 

  11. Nasr, R., Kristensen, T., Baldi, P.: Tree and hashing data structures to speed up chemical searches: Analysis and experiments. Molecular Informatics 30, 791–800 (2011)

    CrossRef  Google Scholar 

  12. Navarro, G., Providel, E.: Fast, Small, Simple Rank/Select on Bitmaps. In: Proc. SEA, pp. 295–306 (2012)

    Google Scholar 

  13. Okanohara, D., Sadakane, K.: Practical Entropy-Compressed Rank/Select Dictionary. In: Workshop on Algorithm Engineering & Experiments (2007)

    Google Scholar 

  14. Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: SODA, pp. 232–242 (2002)

    Google Scholar 

  15. Swamidass, S., Baldi, P.: Bounds and Algorithms for Exact Searches of Chemical Fingerprints in Linear and Sublinear time. Journal of Chemical Information and Modeling 47, 302–317 (2007)

    CrossRef  Google Scholar 

  16. Tarjan, R.E., Yao, A.C.: Storing a Sparse Table. Communications of the ACM 22, 606–611 (1979)

    CrossRef  MathSciNet  MATH  Google Scholar 

  17. Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A Tree Based Method for the Rapid Screening of Chemical Fingerprints. In: Salzberg, S.L., Warnow, T. (eds.) WABI 2009. LNCS, vol. 5724, pp. 194–205. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  18. Kristensen, T.G., Nielsen, J., Pedersen, C.N.S.: A tree-based method for the rapid screening of chemical fingerprints. Algorithms for Molecular Biology 5 (2010)

    Google Scholar 

  19. Turan, G.: Succinct Representation of Graphs. Discrete Applied Math. 8, 289–294 (1984)

    CrossRef  MathSciNet  MATH  Google Scholar 

  20. Williams, H.E., Zobel, J.: Compressing integers for fast file access. Comput. J. 42, 193–201 (1999)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tabei, Y. (2012). Succinct Multibit Tree: Compact Representation of Multibit Trees by Using Succinct Data Structures in Chemical Fingerprint Searches. In: Raphael, B., Tang, J. (eds) Algorithms in Bioinformatics. WABI 2012. Lecture Notes in Computer Science(), vol 7534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33122-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33122-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33121-3

  • Online ISBN: 978-3-642-33122-0

  • eBook Packages: Computer ScienceComputer Science (R0)