Advertisement

Top-k Document Retrieval in Compact Space and Near-Optimal Time

  • Gonzalo Navarro
  • Sharma V. Thankachan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8283)

Abstract

Let \(\cal{D}\)= {d 1,d 2,...d D } be a given set of D string documents of total length n. Our task is to index \(\cal{D}\) such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. There exist linear space data structures of O(n) words for answering such queries in optimal O(p + k) time. In this paper, we describe a compact index of size |CSA|+nlogD + o(nlogD) bits with near optimal time, O(p + klog* n), for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full-text index of \(\cal{D}\), and log* n is the iterated logarithm of n.

Keywords

Compact Space Query Time Document Retrieval Lower Common Ancestor Text Collection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  2. 2.
    Belazzougui, D., Navarro, G.: New lower and upper bounds for representing sequences. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 181–192. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  3. 3.
    Belazzougui, D., Navarro, G., Valenzuela, D.: Improved compressed indexes for full-text document retrieval. J. Discr. Alg. 18, 3–13 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Blum, M., Floyd, R.W., Pratt, V.R., Rivest, R.L., Tarjan, R.E.: Time bounds for selection. J. Comp. Sys. Sci. 7(4), 448–461 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Büttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press (2010)Google Scholar
  6. 6.
    Culpepper, J.S., Navarro, G., Puglisi, S.J., Turpin, A.: Top-k ranked document search in general text databases. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part II. LNCS, vol. 6347, pp. 194–205. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Gagie, T., Kärkkäinen, J., Navarro, G., Puglisi, S.J.: Colored range queries and document retrieval. Theoretical Computer Science 483, 36–50 (2013)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Grossi, R., Iacono, J., Navarro, G., Raman, R., Rao, S.S.: Encodings for range selection and top-k queries. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 553–564. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Hon, W.-K., Patil, M., Shah, R., Wu, S.-B.: Efficient index for retrieving top-k most frequent documents. J. Discr. Alg. 8(4), 402–417 (2010)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Document listing for queries with excluded pattern. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 185–195. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Hon, W.-K., Shah, R., Thankachan, S., Vitter, J.: Faster compressed top-k document retrieval. In: Proc. 23rd DCC, pp. 341–350 (2013)Google Scholar
  12. 12.
    Hon, W.-K., Shah, R., Vitter, J.: Space-efficient framework for top-k string retrieval problems. In: Proc. 50th FOCS, pp. 713–722 (2009)Google Scholar
  13. 13.
    Hon, W.-K., Shah, R., Wu, S.-B.: Efficient index for retrieving top-k most frequent documents. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 182–193. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Konow, R., Navarro, G.: Faster compact top-k document retrieval. In: Proc. 23rd DCC, pp. 351–360 (2013)Google Scholar
  15. 15.
    Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comp. 22(5), 935–948 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc 13th SODA, pp. 657–666 (2002)Google Scholar
  17. 17.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comp. Surv. 39(1), art. 2 (2007)Google Scholar
  18. 18.
    Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proc. 23rd SODA, pp. 1066–1078 (2012)Google Scholar
  19. 19.
    Navarro, G., Puglisi, S.J., Valenzuela, D.: Practical compressed document retrieval. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 193–205. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Navarro, G., Thankachan, S.V.: Faster top-k document retrieval in optimal space. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 255–262. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  21. 21.
    Navarro, G., Valenzuela, D.: Space-efficient top-k document retrieval. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 307–319. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  22. 22.
    Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proc. 9th ALENEX (2007)Google Scholar
  23. 23.
    Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Alg. 3(4), art. 43 (2007)Google Scholar
  24. 24.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discr. Alg. 5, 12–22 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: Proc. 21st SODA, pp. 134–149 (2010)Google Scholar
  26. 26.
    Shah, R., Sheng, C., Thankachan, S.V., Vitter, J.S.: Top-k document retrieval in external memory. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 803–814. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  27. 27.
    Tsur, D.: Top-k document retrieval in optimal space. Inf. Proc. Lett. 113(12), 440–443 (2013)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Välimäki, N., Mäkinen, V.: Space-efficient algorithms for document retrieval. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 205–215. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  29. 29.
    Weiner, P.: Linear pattern matching algorithm. In: Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Gonzalo Navarro
    • 1
  • Sharma V. Thankachan
    • 2
  1. 1.Dept. of Computer ScienceUniversity of ChileChile
  2. 2.Dept. of Computer ScienceLouisiana State UniversityUSA

Personalised recommendations