Document Listing for Queries with Excluded Pattern

  • Wing-Kai Hon
  • Rahul Shah
  • Sharma V. Thankachan
  • Jeffrey Scott Vitter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7354)

Abstract

Let \(\mathcal D\) = {d 1,d 2,...,d D } be a given collection of D string documents of total length n. We consider the problem of indexing \(\mathcal D\) such that, whenever two patterns P  +  and P − comes as an online query, we can list all those documents containing P  +   but not P −. Let t represent the number of such documents. An index proposed by Fischer et al. (LATIN, 2012) can answer this query in \(O(|P^+|+|P^-|+t+\sqrt{n})\) time. However, its space requirement is O(n 3/2) bits. We propose the first linear-space index for this problem with a worst case query time of \(O(|P^+|+|P^-|+\sqrt{n}\log \log n+\sqrt{nt}\log^{2.5} n)\).

Keywords

Query Time Inverted Index Document Retrieval Query Answering Pattern Path 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belazzougui, D., Navarro, G.: Improved Compressed Indexes for Full-Text Document Retrieval. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 386–397. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Bender, M.A., Farach-Colton, M.: The LCA Problem Revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  3. 3.
    Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler transform: Linking range searching and text indexing. In: DCC, pp. 252–261 (2008)Google Scholar
  4. 4.
    Cohen, H., Porat, E.: Fast Set Intersection and Two Patterns Matching. Theor. Comput. Sci. 411(40-42), 3795–3800 (2010)MathSciNetMATHCrossRefGoogle Scholar
  5. 5.
    Shane Culpepper, J., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k Ranked Document Search in General Text Databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Ferragina, P., Giancarlo, R., Manzini, G.: The Myriad Virtues of Wavelet Trees. Inf. and Comp. 207(8), 849–866 (2009)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two-dimensional substring indexing. J. Comput. Syst. Sci. 66(4), 763–774 (2003)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., Mäkinen, V., Salmela, L., Välimäki, N.: Forbidden Patterns. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 327–337. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Gagie, T., Navarro, G., Puglisi, S.J.: Colored Range Queries and Document Retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 67–81. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  10. 10.
    Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)Google Scholar
  11. 11.
    Grossi, R., Gupta, A., Vitter, J.S.: High-Order Entropy-Compressed Text Indexes. In: SODA, pp. 841–850 (2003)Google Scholar
  12. 12.
    Hon, W.K., Patil, M., Shah, R., Wu, S.-B.: Efficient Index for Retrieving Top-k Most Frequent Documents. Journal of Discrete Algorithms 8(4), 402–417 (2010)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Hon, W.K., Shah, R., Vitter, J.S.: Space-Efficient Framework for Top-k String Retrival Problems. In: FOCS, pp. 713–722 (2009)Google Scholar
  14. 14.
    Hon, W.-K., Shah, R., Vitter, J.S.: Compression, Indexing, and Retrieval for Massive String Data. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 260–274. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Hon, W.-K., Shah, R., Thankachan, S.V.: Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 173–184. Springer, Heidelberg (2012)Google Scholar
  16. 16.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: String Retrieval for Multi-pattern Queries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 55–66. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  17. 17.
    Jansson, J., Sadakane, K., Sung, W.K.: Ultra-succinct Representation of Ordered Trees. In: SODA, pp. 575–584 (2007)Google Scholar
  18. 18.
    Karpinski, M., Nekrich, Y.: Top-K Color Queries for Document Retrieval. In: SODA, pp. 401–411 (2011)Google Scholar
  19. 19.
    Manber, U., Myers, G.: Suffix Arrays: A New Method for On-Line String Searches. SICOMP 22(5), 935–948 (1993)MathSciNetMATHGoogle Scholar
  20. 20.
    Matias, Y., Muthukrishnan, S.M., Şahinalp, S.C., Ziv, J.: Augmenting Suffix Trees, with Applications. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 67–78. Springer, Heidelberg (1998)Google Scholar
  21. 21.
    Muthukrishnan, S.: Efficient Algorithms for Document Retrieval Problems. In: SODA, pp. 657–666 (2002)Google Scholar
  22. 22.
    Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: SODA, pp. 1066–1077 (2012)Google Scholar
  23. 23.
    Navarro, G., Puglisi, S.J.: Dual-Sorted Inverted Lists. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 309–321. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  24. 24.
    Patil, M., Thankachan, S.V., Shah, R., Hon, W.K., Vitter, J.S., Chandrasekaran, S.: Inverted Indexes for Phrases and Strings. In: SIGIR, pp. 555–564 (2011)Google Scholar
  25. 25.
    Raman, R., Raman, V., Rao, S.S.: Succinct Indexable Dictionaries with Applications to Encoding k-ary Trees, Prefix Sums and Multisets. TALG 3(4) (2007)Google Scholar
  26. 26.
    Sadakane, K.: Succinct Data Structures for Flexible Text Retrieval Systems. JDA 5(1), 12–22 (2007)MathSciNetMATHGoogle Scholar
  27. 27.
    Välimäki, N., Mäkinen, V.: Space-Efficient Algorithms for Document Retrieval. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 205–215. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  28. 28.
    Weiner, P.: Linear Pattern Matching Algorithms. In: Proc. Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Wing-Kai Hon
    • 1
  • Rahul Shah
    • 2
  • Sharma V. Thankachan
    • 2
  • Jeffrey Scott Vitter
    • 3
  1. 1.National Tsing Hua UniversityTaiwan
  2. 2.Louisiana State UniversityUSA
  3. 3.The University of KansasUSA

Personalised recommendations