Abstract
We consider the problem of indexing a collection \(\cal{D}\) of D strings (documents) of total n characters from an alphabet set of size σ, such that whenever a pattern P (of p characters) and an integer τ ∈ [1, D] comes as a query, we can efficiently report all (i) maximal generic words and (ii) minimal discriminating words as defined below:
-
maximal generic word is a maximal extension of P occurring in at least τ documents..
-
minimal discriminating word is a minimal extension of P occurring in at most τ documents.
These problems were introduced by Kucherov et al. [8], and they proposed linear space indexes occupying O(nlogn) bits with query times O(p + output) and O(p + loglogn + output) for Problem (i) and Problem (ii) respectively. In this paper, we describe succinct indexes of nlogσ + o(nlogσ) + O(n) bits space with near optimal query times i.e., O(p + loglogn + output) for both these problems.
This work is supported in part by National Science Foundation (NSF) Grants CCF–1017623 and CCF–1218904.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)
Chan, T.M.: Persistent predecessor search and orthogonal point location on the word ram. In: SODA, pp. 1131–1145 (2011)
Fadiel, A., Lithwick, S., Ganji, G., Scherer, S.W.: Remarkable sequence signatures in archaeal genomes. Archaea 1(3), 185–190 (2003)
Fischer, J., Heun, V.: A New Succinct Representation of RMQ-Information and Improvements in the Enhanced Suffix Array. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, pp. 459–470. Springer, Heidelberg (2007)
Fischer, J., Heun, V., Stühler, H.M.: Practical Entropy-Bounded Schemes for O(1)-Range Minimum Queries. In: IEEE DCC, pp. 272–281 (2008)
Gawrychowski, P., Kucherov, G., Nekrich, Y., Starikovskaya, T.: Minimal discriminating words problem revisited. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 129–140. Springer, Heidelberg (2013)
Hon, W.-K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: FOCS, pp. 713–722 (2009)
Kucherov, G., Nekrich, Y., Starikovskaya, T.: Computing discriminating and generic words. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 307–317. Springer, Heidelberg (2012)
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976)
Raman, R., Raman, V., Rao, S.S.: Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees and Multisets. In: ACM-SIAM SODA, pp. 233–242 (2002)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1), 12–22 (2007)
Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)
Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Biswas, S., Patil, M., Shah, R., Thankachan, S.V. (2014). Succinct Indexes for Reporting Discriminating and Generic Words. In: Moura, E., Crochemore, M. (eds) String Processing and Information Retrieval. SPIRE 2014. Lecture Notes in Computer Science, vol 8799. Springer, Cham. https://doi.org/10.1007/978-3-319-11918-2_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-11918-2_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11917-5
Online ISBN: 978-3-319-11918-2
eBook Packages: Computer ScienceComputer Science (R0)