Skip to main content

A Faster Grammar-Based Self-index

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7183))

Abstract

To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, we can add \(\mathcal{O}{z log log z}\) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in \(\mathcal{O}({m^{2} + (m + occ) log log n})\) time. All previous self-indexes are either larger or slower in the worst case.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)

    Google Scholar 

  2. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler Transform: Linking range searching and text indexing. In: Proceedings of the Data Compression Conference (DCC), pp. 252–261 (2008)

    Google Scholar 

  4. Claude, F., Navarro, G.: Self-indexed Text Compression Using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  5. Claude, F., Navarro, G.: Improved grammar-based self-indexes. Tech. Rep. 1110.4493, arxiv.org (2011)

    Google Scholar 

  6. Do, H.H., Jansson, J., Sadakane, K., Sung, W.K.: Indexing strings via textual substitutions from a reference, manuscript

    Google Scholar 

  7. van Emde Boas, P.: Preserving order in a forest in less than logarithmic time. In: Proceedings of the 16th Symposium on Foundations of Computer Science (FOCS), pp. 75–84 (1975)

    Google Scholar 

  8. Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM 52(4), 552–581 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Fischer, J.: Wee LCP. Information Processing Letters 110(8-9), 317–320 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fischer, J.: Combined data structure for previous- and next-smaller-values. Theoretical Computer Science 412(22), 2451–2456 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and related techniques for geometry problems. In: Proceedings of the 16th Symposium on Theory of Computing (STOC), pp. 135–143 (1984)

    Google Scholar 

  12. Gagie, T., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A compressed self-index for genomic databases. Tech. Rep. 1110.1355, arxiv.org (2011)

    Google Scholar 

  13. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)

    Google Scholar 

  14. Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: On Entropy-Compressed Text Indexing in External Memory. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 75–89. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  15. Huang, S., Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Indexing Similar DNA Sequences. In: Chen, B. (ed.) AAIM 2010. LNCS, vol. 6124, pp. 180–190. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  16. Kreft, S., Navarro, G.: Self-indexing Based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  18. Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. McCreight, E.M.: Priority search trees. SIAM Journal on Computing 14(2), 257–276 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  20. Morrison, D.R.: PATRICIA - Practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4), 514–534 (1968)

    Article  Google Scholar 

  21. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)

    Google Scholar 

  22. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J. (2012). A Faster Grammar-Based Self-index. In: Dediu, AH., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2012. Lecture Notes in Computer Science, vol 7183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28332-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28332-1_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28331-4

  • Online ISBN: 978-3-642-28332-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics