Does Selective Search Benefit from WAND Optimization?

  • Yubin Kim
  • Jamie Callan
  • J. Shane Culpepper
  • Alistair Moffat
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)

Abstract

Selective search is a distributed retrieval technique that reduces the computational cost of large-scale information retrieval. By partitioning the collection into topical shards, and using a resource selection algorithm to identify a subset of shards to search, selective search allows retrieval effectiveness to be maintained while evaluating fewer postings, often resulting in 90+% reductions in querying cost. However, there has been only limited attention given to the interaction between dynamic pruning algorithms and topical index shards. We demonstrate that the WAND dynamic pruning algorithm is more effective on topical index shards than it is on randomly-organized index shards, and that the savings generated by selective search and WAND are additive. We also compare two methods for applying WAND to topical shards: searching each shard with a separate top-k heap and threshold; and sequentially passing a shared top-k heap and threshold from one shard to the next, in the order established by a resource selection mechanism. Separate top-k heaps provide low query latency, whereas a shared top-k heap provides higher throughput.

Keywords

Selective search Distributed search Dynamic pruning Efficiency 

References

  1. 1.
    Aly, R., Hiemstra, D., Demeester, T.: Taily: shard selection using the tail of score distributions. In: Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 673–682 (2013)Google Scholar
  2. 2.
    Arguello, J., Callan, J., Diaz, F.: Classification-based resource selection. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1277–1286 (2009)Google Scholar
  3. 3.
    Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proceedings of the 12th International Conference on Information and Knowledge Management, pp. 426–434 (2003)Google Scholar
  4. 4.
    Cacheda, F., Carneiro, V., Plachouras, V., Ounis, I.: Performance comparison of clustered and replicated information retrieval systems. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 124–135. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  5. 5.
    Cambazoglu, B.B., Varol, E., Kayaaslan, E., Aykanat, C., Baeza-Yates, R.: Query forwarding in geographically distributed search engines. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 90–97 (2010)Google Scholar
  6. 6.
    Croft, W.B.: A model of cluster searching based on classification. Inf. Syst. 5(3), 189–195 (1980)CrossRefGoogle Scholar
  7. 7.
    Dimopoulos, C., Nepomnyachiy, S., Suel, T.: Optimizing top-\(k\) document retrieval strategies for block-max indexes. In: Proceedings of the of the Sixth ACM International Conference on Web Search and Data Mining, pp. 113–122 (2013)Google Scholar
  8. 8.
    Gravano, L., García-Molina, H., Tomasic, A.: GlOSS: Text-source discovery over the internet. ACM Trans. Database Syst. 24, 229–264 (1999)CrossRefGoogle Scholar
  9. 9.
    Ipeirotis, P.G., Gravano, L.: Distributed search over the hidden web: Hierarchical database sampling and selection. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 394–405 (2002)Google Scholar
  10. 10.
    Kang, C., Wang, X., Chang, Y., Tseng, B.: Learning to rank with multi-aspect relevance for vertical search. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, pp. 453–462 (2012)Google Scholar
  11. 11.
    Kulkarni, A., Callan, J.: Document allocation policies for selective searching of distributed indexes. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 449–458 (2010)Google Scholar
  12. 12.
    Kulkarni, A., Callan, J.: Selective search: Efficient and effective search of large textual collections. ACM Trans. Inf. Syst. 33(4), 17:1–17:33 (2015)CrossRefGoogle Scholar
  13. 13.
    Kulkarni, A., Tigelaar, A., Hiemstra, D., Callan, J.: Shard ranking and cutoff estimation for topically partitioned collections. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 555–564 (2012)Google Scholar
  14. 14.
    Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Soft. Prac. & Exp. 41(1), 1–29 (2015)CrossRefGoogle Scholar
  15. 15.
    Nottelmann, H., Fuhr, N.: Evaluating different methods of estimating retrieval quality for resource selection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 290–297. ACM (2003)Google Scholar
  16. 16.
    Paltoglou, G., Salampasis, M., Satratzemi, M.: Integral based source selection for uncooperative distributed information retrieval environments. In: Proceedings of the 2008 ACM Workshop on Large-Scale Distributed Systems for Information Retrieval, pp. 67–74 (2008)Google Scholar
  17. 17.
    Petri, M., Culpepper, J.S., Moffat, A.: Exploring the magic of WAND. In: Proceedings of the Australian Document Computing Symposium, pp. 58–65 (2013)Google Scholar
  18. 18.
    Rojas, O., Gil-Costa, V., Marin, M.: Distributing effciently the block-max WAND algorithm. In: Proceedings of the 2013 International Conference on Computational Science, pp. 120–129 (2013)Google Scholar
  19. 19.
    Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)Google Scholar
  20. 20.
    Shokouhi, M.: Central-Rank-Based Collection Selection in Uncooperative Distributed Information Retrieval. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECIR 2007. LNCS, vol. 4425, pp. 160–172. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Si, L., Callan, J.: Relevant document distribution estimation method for resource selection. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 298–305 (2003)Google Scholar
  22. 22.
    Strohman, T., Turtle, H., Croft, W.B.: Optimization strategies for complex queries. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 219–225 (2005)Google Scholar
  23. 23.
    Thomas, P., Shokouhi, M.: Sushi: Scoring scaled samples for server selection. In: Proceedings of the 32nd ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 419–426 (2009)Google Scholar
  24. 24.
    Yuwono, B., Lee, D.L.: Server ranking for distributed text retrieval systems on internet. In: Proceedings of the International Conference on Database Systems for Advanced Applications, pp. 41–49 (1997)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Yubin Kim
    • 1
  • Jamie Callan
    • 1
  • J. Shane Culpepper
    • 2
  • Alistair Moffat
    • 3
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.RMIT UniversityMelbourneAustralia
  3. 3.The University of MelbourneMelbourneAustralia

Personalised recommendations