Skip to main content

Succinct Indexes for Reporting Discriminating and Generic Words

  • Conference paper
String Processing and Information Retrieval (SPIRE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8799))

Included in the following conference series:

  • 622 Accesses

Abstract

We consider the problem of indexing a collection \(\cal{D}\) of D strings (documents) of total n characters from an alphabet set of size σ, such that whenever a pattern P (of p characters) and an integer τ ∈ [1, D] comes as a query, we can efficiently report all (i) maximal generic words and (ii) minimal discriminating words as defined below:

  • maximal generic word is a maximal extension of P occurring in at least τ documents..

  • minimal discriminating word is a minimal extension of P occurring in at most τ documents.

These problems were introduced by Kucherov et al. [8], and they proposed linear space indexes occupying O(nlogn) bits with query times O(p + output) and O(p + loglogn + output) for Problem (i) and Problem (ii) respectively. In this paper, we describe succinct indexes of nlogσ + o(nlogσ) + O(n) bits space with near optimal query times i.e., O(p + loglogn + output) for both these problems.

This work is supported in part by National Science Foundation (NSF) Grants CCF–1017623 and CCF–1218904.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  2. Chan, T.M.: Persistent predecessor search and orthogonal point location on the word ram. In: SODA, pp. 1131–1145 (2011)

    Google Scholar 

  3. Fadiel, A., Lithwick, S., Ganji, G., Scherer, S.W.: Remarkable sequence signatures in archaeal genomes. Archaea 1(3), 185–190 (2003)

    Article  Google Scholar 

  4. Fischer, J., Heun, V.: A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 459–470. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Fischer, J., Heun, V., Stühler, H.M.: Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries. In: IEEE DCC, pp. 272–281 (2008)

    Google Scholar 

  6. Gawrychowski, P., Kucherov, G., Nekrich, Y., Starikovskaya, T.: Minimal discriminating words problem revisited. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 129–140. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  7. Hon, W.-K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: FOCS, pp. 713–722 (2009)

    Google Scholar 

  8. Kucherov, G., Nekrich, Y., Starikovskaya, T.: Computing discriminating and generic words. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 307–317. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  9. McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  10. Raman, R., Raman, V., Rao, S.S.: Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees and Multisets. In: ACM-SIAM SODA, pp. 233–242 (2002)

    Google Scholar 

  11. Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1), 12–22 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  12. Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)

    Google Scholar 

  13. Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Biswas, S., Patil, M., Shah, R., Thankachan, S.V. (2014). Succinct Indexes for Reporting Discriminating and Generic Words. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11918-2_9

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11917-5

  • Online ISBN: 978-3-319-11918-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics