Efficient dynamic pruning on largest scores first (LSF) retrieval

Jiang, Kun; Yang, Yue-xiang

doi:10.1631/FITEE.1500190

Efficient dynamic pruning on largest scores first (LSF) retrieval

Published: 09 January 2016

Volume 17, pages 1–14, (2016)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Kun Jiang¹ &
Yue-xiang Yang¹

113 Accesses
4 Citations
Explore all metrics

Abstract

Inverted index traversal techniques have been studied in addressing the query processing performance challenges of web search engines, but still leave much room for improvement. In this paper, we focus on the inverted index traversal on document-sorted indexes and the optimization technique called dynamic pruning, which can efficiently reduce the hardware computational resources required. We propose another novel exhaustive index traversal scheme called largest scores first (LSF) retrieval, in which the candidates are first selected in the posting list of important query terms with the largest upper bound scores and then fully scored with the contribution of the remaining query terms. The scheme can effectively reduce the memory consumption of existing term-at-atime (TAAT) and the candidate selection cost of existing document-at-a-time (DAAT) retrieval at the expense of revisiting the posting lists of the remaining query terms. Preliminary analysis and implementation show comparable performance between LSF and the two well-known baselines. To further reduce the number of postings that need to be revisited, we present efficient rank safe dynamic pruning techniques based on LSF, including two important optimizations called list omitting (LSF_LO) and partial scoring (LSF_PS) that make full use of query term importance. Finally, experimental results with the TREC GOV2 collection show that our new index traversal approaches reduce the query latency by almost 27% over the WAND baseline and produce slightly better results compared with the MaxScore baseline, while returning the same results as exhaustive evaluation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Anh, V.N., Moffat, A., 2005. Simplified similarity scoring using term ranks. Proc. 28th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.226–233. http://dx.doi.org/10.1145/1076034.1076075
Google Scholar
Anh, V.N., Moffat, A., 2006. Pruned query evaluation using pre-computed impacts. Proc. 29th Annual ACM SIGIR Conf. on Research and Development in Information Retrieval, p.372–379. http://dx.doi.org/10.1145/1148170.1148235
Google Scholar
Anh, V.N., Moffat, A., 2010. Index compression using 64-bit words. Softw. Pract. Exper., 40(2):131–147. http://dx.doi.org/10.1002/spe.948
Google Scholar
Badue, C., Ribeiro-Neto, B., Baeza-Yates, R., et al., 2001. Distributed query processing using partitioned inverted files. Proc. 8th Int. Symp. on String Processing and Information Retrieval, p.10–20. http://dx.doi.org/10.1109/SPIRE.2001.989733
Chapter Google Scholar
Broder, A.Z., Carmel, D., Herscovici, M., et al., 2003. Efficient query evaluation using a two-level retrieval process. Proc. 12th Int. Conf. on Information and Knowledge Management, p.426–434. http://dx.doi.org/10.1145/956863.956944
Google Scholar
Buckley, C., Lewit, A.F., 1985. Optimization of inverted vector searches. Proc. 8th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.97–110. http://dx.doi.org/10.1145/253495.253515
Google Scholar
Büttcher, S., Clarke, C.L.A., 2007. Index compression is good, especially for random access. Proc. 16th ACM Conf. on Information and Knowledge Management, p.761–770. http://dx.doi.org/10.1145/1321440.1321546
Google Scholar
Büttcher, S., Clarke, C.L.A., Cormack, G.V., 2010. Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, USA.
MATH Google Scholar
Chakrabarti, K., Chaudhuri, S., Ganti, V., 2011. Intervalbased pruning for top-k processing over compressed lists. Proc. 27th Int. Conf. on Data Engineering, p.709–720. http://dx.doi.org/10.1109/ICDE.2011.5767855
Google Scholar
Croft, B., Metzler, D., Strohman, T., 2010. Search Engines: Information Retrieval in Practice. Addison Wesley, USA.
Google Scholar
Dean, J., 2009. Challenges in building large-scale information retrieval systems: invited talk. Proc. 2nd ACM Int. Conf. on Web Search and Data Mining, p.1. http://dx.doi.org/10.1145/1498759.1498761
Google Scholar
Delbru, R., Campinas, S., Tummarello, G., 2012. Searching web data: an entity retrieval and high-performance indexing model. Web Semant. Sci. Serv. Agents World Wide Web, 10:33–58. http://dx.doi.org/10.1016/j.websem.2011.04.004
Article Google Scholar
Dimopoulos, C., Nepomnyachiy, S., Suel, T., 2013. Optimizing top-k document retrieval strategies for block-max indexes. Proc. 6th ACM Int. Conf. on Web Search and Data Mining, p.113–122. http://dx.doi.org/10.1145/2433396.2433412
Google Scholar
Ding, S., Suel, T., 2011. Faster top-k document retrieval using block-max indexes. Proc. 34th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.993–1002. http://dx.doi.org/10.1145/2009916.2010048
Google Scholar
Fontoura, M., Josifovski, V., Liu, J.H., et al., 2011. Evaluation strategies for top-k queries over memory-resident inverted indexes. Proc. VLDB Endow., p.1213–1224.
Google Scholar
Jiang, K., Yang, Y.X., 2015. Exhaustive hybrid posting lists traversing technique. Proc. 5th Int. Conf. on Intelligence Science and Big Data Engineering, p.1–11. http://dx.doi.org/10.1007/978-3-319-23862-3_1
Google Scholar
Jiang, K., Song, X.S., Yang, Y.X., 2014. Performance evaluation of inverted index traversal techniques. Proc. 17th Int. Conf. on Computational Science and Engineering, p.1715–1720. http://dx.doi.org/10.1109/CSE.2014.315
Google Scholar
Jonassen, S., Bratsberg, S.E., 2011. Efficient compressed inverted index skipping for disjunctive text-queries. Proc. 33rd European Conf. on Advances in Information Retrieval, p.530–542. http://dx.doi.org/10.1007/978-3-642-20161-5_53
Chapter Google Scholar
Lacour, P., Macdonald, C., Ounis, I., 2008. Efficiency comparison of document matching techniques. Proc. European Conf. on Information Retrieval, p.37–46.
Google Scholar
Lester, N., Moffat, A., Webber, W., et al., 2005. Spacelimited ranked query evaluation using adaptive pruning. Proc. 6th Int. Conf. on Web Information Systems Engineering, p.470–477. http://dx.doi.org/10.1007/11581062_37
Google Scholar
Macdonald, C., Ounis, I., Tonellotto, N., 2011. Upperbound approximations for dynamic pruning. ACM Trans. Inform. Syst., 29(4):17.1–17.28. http://dx.doi.org/10.1145/2037661.2037662
Article Google Scholar
Manning, C.D., Raghavan, P., Schütze, H., 2008. Introduction to Information Retrieval. Cambridge University Press, Cambridge, USA.
Book Google Scholar
Melink, S., Raghavan, S., Yang, B., et al., 2001. Building a distributed full-text index for the Web. Proc. 10th Int. Conf. on World Wide Web, p.396–406. http://dx.doi.org/10.1145/371920.372095
Google Scholar
Moffat, A., Zobel, J., 1996. Self-indexing inverted files for fast text retrieval. ACM Trans. Inform. Syst., 14(4):349–379. http://dx.doi.org/10.1145/237496.237497
Article Google Scholar
Ounis, I., Amati, G., Plachouras, V., et al., 2006. Terrier: a high performance and scalable information retrieval platform. Proc. OSIR Workshop, p.18–25.
Google Scholar
Puppin, D., Silvestri, F., Perego, R., et al., 2010. Tuning the capacity of search engines: load-driven routing and incremental caching to reduce and balance the load. ACM Trans. Inform. Syst., 28(2):5.1–5.36. http://dx.doi.org/10.1145/1740592.1740593
Article Google Scholar
Silvestri, F., Venturini, R., 2010. VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.1219–1228. http://dx.doi.org/10.1145/1871437.1871592
Google Scholar
Strohman, T., Croft, W.B., 2007. Efficient document retrieval in main memory. Proc. 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.175–182. http://dx.doi.org/10.1145/1277741.1277774
Google Scholar
Strohman, T., Turtle, H., Croft, W.B., 2005. Optimization strategies for complex queries. Proc. 28th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.219–225. http://dx.doi.org/10.1145/1076034.1076074
Google Scholar
Turtle, H., Flood, J., 1995. Query evaluation: strategies and optimizations. Inform. Process. Manag., 31(6):831–850. http://dx.doi.org/10.1016/0306-4573(95)00020-H
Article Google Scholar
Wang, L.D., Lin, J., Metzler, D., 2011. A cascade ranking model for efficient ranked retrieval. Proc. 34th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, p.105–114. http://dx.doi.org/10.1145/2009916.2009934
Google Scholar
Zobel, J., Moffat, A., 2006. Inverted files for text search engines. ACM Comput. Surv., 38(2):6.1–6.56. http://dx.doi.org/10.1145/1132956.1132959
Article Google Scholar
Zukowski, M., Heman, S., Nes, N., et al., 2006. Super-scalar RAM-CPU cache compression. Proc. 22nd Int. Conf. on Data Engineering, p.59. http://dx.doi.org/10.1109/ICDE.2006.150
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, China
Kun Jiang & Yue-xiang Yang

Authors

Kun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Yue-xiang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Jiang.

Additional information

ORCID: Kun JIANG, http://orcid.org/0000-0003-1316-5237

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, K., Yang, Yx. Efficient dynamic pruning on largest scores first (LSF) retrieval. Frontiers Inf Technol Electronic Eng 17, 1–14 (2016). https://doi.org/10.1631/FITEE.1500190

Download citation

Received: 06 June 2015
Accepted: 14 October 2015
Published: 09 January 2016
Issue Date: January 2016
DOI: https://doi.org/10.1631/FITEE.1500190

Keywords

CLC number

TP393

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient dynamic pruning on largest scores first (LSF) retrieval

Abstract

Access this article

Similar content being viewed by others

Exhaustive Hybrid Posting Lists Traversing Technique

Optimizing Scoring and Sorting Operations for Faster WAND Processing

Faster MaxScore Query Processing with Essential List Skipping

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

CLC number

Navigation

Efficient dynamic pruning on largest scores first (LSF) retrieval

Abstract

Access this article

Similar content being viewed by others

Exhaustive Hybrid Posting Lists Traversing Technique

Optimizing Scoring and Sorting Operations for Faster WAND Processing

Faster MaxScore Query Processing with Essential List Skipping

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

CLC number

Search

Navigation