Ranked Document Selection

  • J. Ian Munro
  • Gonzalo Navarro
  • Rahul Shah
  • Sharma V. Thankachan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8503)

Abstract

Let \({\cal D}\) be a collection of string documents of n characters in total. The top-k document retrieval problem is to preprocess \({\cal D}\) into a data structure that, given a query (P,k), can return the k documents of \({\cal D}\) most relevant to pattern P. The relevance of a document d for a pattern P is given by a predefined ranking function w(P,d). Linear space and optimal query time solutions already exist for this problem.

In this paper we consider a novel problem, document selection queries, which aim to report the kth document most relevant to P (instead of reporting all top-k documents). We present a data structure using O(n logεn) space, for any constant ε > 0, answering selection queries in time O(logk / loglogn), and a linear-space data structure answering queries in time O(logk), given the locus node of P in a (generalized) suffix tree of \({\cal D}\). We also prove that it is unlikely that a succinct-space solution for this problem exists with poly-logarithmic query time.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Fredman, M., Willard, D.: Surpassing the information theoretic barrier with fusion trees. J. Comp. Sys. Sci. 47, 424–436 (1993)CrossRefMATHMathSciNetGoogle Scholar
  2. 2.
    Grossi, R., Orlandi, A., Raman, R., Rao, S.S.: More haste, less waste: Lowering the redundancy in fully indexable dictionaries. In: STACS, pp. 517–528 (2009)Google Scholar
  3. 3.
    Hon, W.-K., Patil, M., Shah, R., Thankachan, S.V., Vitter, J.S.: Indexes for document retrieval with relevance. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 351–362. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  4. 4.
    Hon, W.-K., Patil, M., Shah, R., Wu, S.-B.: Efficient index for retrieving top-k most frequent documents. J. Discr. Alg. 8(4), 402–417 (2010)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: On position restricted substring searching in succinct space. J. Discr. Alg. 17, 109–114 (2012)CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Space-efficient framework for top-k string retrieval. J. of the ACM (to appear, 2014)Google Scholar
  7. 7.
    Hon, W.-K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: FOCS, pp. 713–722 (2009)Google Scholar
  8. 8.
    Lewenstein, M.: Orthogonal range searching for text indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 267–302. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Mäkinen, V., Navarro, G.: Position-restricted substring searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  10. 10.
    Munro, I.: Tables. In: FSTTCS, pp. 37–42 (1996)Google Scholar
  11. 11.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666 (2002)Google Scholar
  12. 12.
    Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. ACM Computing Surveys 46(4), article 52 (2014)Google Scholar
  13. 13.
    Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: SODA, pp. 1066–1077 (2012)Google Scholar
  14. 14.
    Navarro, G., Thankachan, S.V.: Faster top-k document retrieval in optimal space. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 255–262. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  15. 15.
    Russo, L., Navarro, G., Oliveira, A.: Fully-compressed suffix trees. ACM Trans. Alg. 7(4), art. 53 (2011)Google Scholar
  16. 16.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)Google Scholar
  17. 17.
    Shah, R., Sheng, C., Thankachan, S.V., Vitter, J.S.: Top-k document retrieval in external memory. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 803–814. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  18. 18.
    Tsur, D.: Top-k document retrieval in optimal space. Inf. Process. Lett. 113(12), 440–443 (2013)CrossRefMathSciNetGoogle Scholar
  19. 19.
    Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • J. Ian Munro
    • 1
  • Gonzalo Navarro
    • 2
  • Rahul Shah
    • 3
  • Sharma V. Thankachan
    • 1
  1. 1.Cheriton School of CSUniv. WaterlooCanada
  2. 2.Dept. of CSUniv. ChileChile
  3. 3.School of EECSLouisiana State Univ.USA

Personalised recommendations