Advertisement

Information Retrieval Journal

, Volume 20, Issue 3, pp 292–316 | Cite as

Waves: a fast multi-tier top-k query processing algorithm

  • Caio Moura DaoudEmail author
  • Edleno Silva de Moura
  • David Fernandes
  • Altigran Soares da Silva
  • Cristian Rossi
  • Andre Carvalho
Information Retrieval Efficiency

Abstract

In this paper, we present Waves, a novel document-at-a-time algorithm for fast computing of top-k query results in search systems. The Waves algorithm uses multi-tier indexes for processing queries. It performs successive tentative evaluations of results which we call waves. Each wave traverses the index, starting from a specific tier level i. Each wave i may insert only those documents that occur in that tier level into the answer. After processing a wave, the algorithm checks whether the answer achieved might be changed by successive waves or not. A new wave is started only if it has a chance of changing the top-k scores. We show through experiments that such lazy query processing strategy results in smaller query processing times when compared to previous approaches proposed in the literature. We present experiments to compare Waves’ performance to the state-of-the-art document-at-a-time query processing methods that preserve top-k results and show scenarios where the method can be a good alternative algorithm for computing top-k results.

Keywords

Information retrieval Query processing Search system 

Notes

Acknowledgements

Authors thank E-vox/FAPEAM project and CNPq fellowship grants (Edleno S. de Moura and Altigran S. da Silva) for the financial support.

References

  1. Akbarinia, R., Pacitti, E., & Valduriez, P. (2007). Best position algorithms for top-k queries. In Proceedings of the 33rd international conference on very large data bases, VLDB ’07 (pp. 495–506). VLDB Endowment. http://dl.acm.org/citation.cfm?id=1325851.1325909.
  2. Anh, V., & Moffat, A. (2006). Pruned query evaluation using pre-computed impacts. In: ACM SIGIR (pp. 372–379).Google Scholar
  3. Anh, V. N., de Kretser, O., & Moffat, A. (2001). Vector-space ranking with effective early termination. In ACM SIGIR (pp. 35–42).Google Scholar
  4. Baeza-Yates, R., & Ribeiro-Neto, B. (2011). Modern information retrieval (2nd ed.). Reading: Addison-Wesley Publishing Company.Google Scholar
  5. Broder, A. Z., Carmel, D., Herscovici, M., Soffer, A., & Zien, J. (2003). Efficient query evaluation using a two-level retrieval process. In ACM CIKM (pp. 426–434).Google Scholar
  6. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., et al. (2001). Static index pruning for information retrieval systems. In ACM SIGIR (pp. 43–50).Google Scholar
  7. Carvalho, A., Rossi, C., de Moura, E. S., Fernandes, D., & da Silva, A. S. (2012). LePrEF: Learn to pre-compute evidence fusion for efficient query evaluation. JASIST, 55(92), 1–28.Google Scholar
  8. Chakrabarti, K., Chaudhuri, S., & Ganti, V. (2011). Interval-based pruning for top-k processing over compressed lists. In Proceedings of the 2011 IEEE 27th international conference on data engineering, ICDE ’11 (pp. 709–720). IEEE Computer Society, Washington, DC, USA. doi: 10.1109/ICDE.2011.5767855.
  9. Daoud, C., de Moura, E. S., Fernandes, D., da Silva, A. S., Carvalho, A. L., & Rossi, C. (2016). Fast top-k preserving query processing using two-tier indexes. Information Processing & Management, 52(5), 855–872.CrossRefGoogle Scholar
  10. Dimopoulos, C., Nepomnyachiy, S., & Suel, T. (2013). A candidate filtering mechanism for fast top-k query processing on modern cpus. In ACM SIGIR (pp. 723–732).Google Scholar
  11. Ding, S., & Suel, T. (2011). Faster top-k document retrieval using block-max indexes. In ACM SIGIR (pp. 993–1002).Google Scholar
  12. Fontoura, M., Josifovski, V., Liu, J., Venkatesan, S., Zhu, X., & Zien, J. Y. (2011). Evaluation strategies for top-k queries over memory-resident inverted indexes. PVLDB, 4(12), 1213–1224.Google Scholar
  13. Herrera, M. R., de Moura, E. S., Cristo, M., Silva, T. P., & da Silva, A. S. (2010). Exploring features for the automatic identification of user goals in web search. Information Processing & Management, 46(2), 131–142.CrossRefGoogle Scholar
  14. Moffat, A., & Zobel, J. (1996). Self-indexing inverted files for fast text retrieval. ACM TOIS, 14(4), 349–379. doi: 10.1145/237496.237497.CrossRefGoogle Scholar
  15. Moura, E Sd, Santos, C Fd, Araujo, Bd S, Silva, A Sd, Calado, P., Nascimento, M. A., et al. (2008). Locality-based pruning methods for web search. ACM Transactions on Information Systems (TOIS), 26(2), 9.CrossRefGoogle Scholar
  16. Ntoulas, A., & Cho, J. (2007). Pruning policies for two-tiered inverted index with correctness guarantee. In ACM SIGIR (pp. 191–198).Google Scholar
  17. Ottaviano, G., & Venturini, R. (2014). Partitioned Elias–Fano indexes. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval (pp. 273–282). ACM.Google Scholar
  18. Risvik, K., Aasheim, Y., & Lidal, M. (2003). Multi-tier architecture for web search engines. In First Latin American web congress (pp. 132–143).Google Scholar
  19. Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In ACM SIGIR (pp. 232–241).Google Scholar
  20. Rossi, C., de Moura, E. S., Carvalho, A. L., & da Silva, A. S. (2013). Fast document-at-a-time query processing using two-tier indexes. In ACM SIGIR (pp. 183–192).Google Scholar
  21. Salton, G., Wong, A., & Yang, C. S. (1974). A vector space model for automatic indexing. Tech. Rep., Ithaca, NY.Google Scholar
  22. Shan, D., Ding, S., He, J., Yan, H., & Li, X. (2012). Optimized top-k processing with global page scores on block-max indexes. In WSDM (pp. 423–432).Google Scholar
  23. Silvestri, F. (2007). Sorting out the document identifier assignment problem. In European conference on information retrieval (pp. 101–112). Springer.Google Scholar
  24. Skobeltsyn, G., Junqueira, F., Plachouras, V., & Baeza-Yates, R. (2008). ResIn: A combination of results caching and index pruning for high-performance web search engines. In ACM SIGIR (pp. 131–138).Google Scholar
  25. Strohman, T., & Croft, W. B. (2007). Efficient document retrieval in main memory. In ACM SIGIR (pp. 175–182).Google Scholar
  26. Zukowski, M., Heman, S., Nes, N., & Boncz, P. (2006). Super-scalar ram-cpu cache compression. In Proceedings of the 22nd international conference on data engineering, ICDE’06 pp. 59. IEEE Computer Society, Washington, DC, USA.Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Caio Moura Daoud
    • 1
    Email author
  • Edleno Silva de Moura
    • 1
  • David Fernandes
    • 1
  • Altigran Soares da Silva
    • 1
  • Cristian Rossi
    • 1
  • Andre Carvalho
    • 1
  1. 1.Institute of ComputingFederal University of AmazonasManausBrazil

Personalised recommendations