Distributed and Parallel Databases

, Volume 30, Issue 3–4, pp 209–237 | Cite as

On optimality-ratio and coverage in ranking of joined search results

Article

Abstract

In complex search tasks, it is often required to pose several basic search queries, join the answers to these queries, where each answer is given as a ranked list of items, and return a ranked list of combinations. However, the join result may include too many repetitions of items, and hence, frequently the entire join is too large to be useful. This can be solved by choosing a small subset of the join result. The focus of this paper is on how to choose this subset. We propose two measures for estimating the quality of result sets, namely, coverage and optimality ratio. Intuitively, maximizing the coverage aims at including in the result as many as possible appearances of items in their optimal combination, and maximizing the optimality ratio means striving to have each item appearing only in its optimal combination, i.e., only in the most highly ranked combination that contains it. One of the difficulties, when choosing the subset of the join in a complex search, is that there is a conflict between maximizing the coverage and maximizing the optimality ratio.

In this paper, we introduce the measures coverage and optimality ratio. We present new semantics for complex search queries, aiming at providing high coverage and high optimality ratio. We examine the quality of the results of existing and the novel semantics, according to these two measures, and we provide algorithms for answering complex search queries under the new semantics. Finally, we present an experimental study, using Yahoo! Local Search Web Services, of the efficiency and the scalability of our algorithms, showing that complex search queries can be evaluated effectively under the proposed semantics.

Keywords

Data integration Search Top-k Join Diversity 

Notes

Acknowledgements

This work was partially supported by The Israeli Ministry of Science and Technology (Grant 3/6472) and by the German–Israeli Foundation for Scientific Research and Development (Grant 2165/07).

References

  1. 1.
    Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM, pp. 5–14 (2009) CrossRefGoogle Scholar
  2. 2.
    Balke, W.T., Guntzer, U., Zheng, J.X.: Efficient distributed skylining for web information systems. In: EDBT, pp. 256–273 (2004) Google Scholar
  3. 3.
    Borzsonyi, S., Stocker, K., Kossmann, D.: The skyline operator. In: Proc. of 17th International Conference on Data Engineering, pp. 421–430 (2001) CrossRefGoogle Scholar
  4. 4.
    Braga, D., Campi, A., Ceri, S., Raffio, A.: Joining the results of heterogeneous search engines. Inf. Syst. 33(7–8), 658–680 (2008) CrossRefGoogle Scholar
  5. 5.
    Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. Proceedings of the VLDB Endowment 1(1), 562–573 (2008) Google Scholar
  6. 6.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: SIGIR, pp. 335–336 (1998) Google Scholar
  7. 7.
    Ceri, S.M.B.: Search Computing: Challenges and Directions. Springer, Berlin (2010) Google Scholar
  8. 8.
    Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: Finding k-dominant skylines in high dimensional space. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, pp. 503–514. ACM, New York (2006) CrossRefGoogle Scholar
  9. 9.
    Chan, C., Jagadish, H., Tan, K., Tung, A., Zhang, Z.: On high dimensional skylines. In: Advances in Database Technology—EDBT 2006, pp. 478–495 (2006) CrossRefGoogle Scholar
  10. 10.
    Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: SIGIR, pp. 429–436 (2006) Google Scholar
  11. 11.
    Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: SIGIR, pp. 659–666 (2008) CrossRefGoogle Scholar
  12. 12.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003) MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Finger, J., Polyzotis, N.: Robust and efficient algorithms for rank join evaluation. In: SIGMOD, pp. 415–428. ACM, New York (2009) CrossRefGoogle Scholar
  14. 14.
    Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: WWW, pp. 381–390 (2009) CrossRefGoogle Scholar
  15. 15.
    Ilyas, F., Aref, G., Elmagarmid, K.: Supporting top-k join queries in relational databases. VLDB J. 13(3), 207–221 (2004) CrossRefGoogle Scholar
  16. 16.
    Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4), 1–58 (2008) CrossRefGoogle Scholar
  17. 17.
    Jin, W., Ester, M., Hu, Z., Han, J.: The multi-relational skyline operator. In: ICDE, pp. 1276–1280 (2007) Google Scholar
  18. 18.
    Jin, W., Han, J., Ester, M.: Mining thick skylines over large databases. In: Knowledge Discovery in Databases: PKDD, pp. 255–266 (2004) CrossRefGoogle Scholar
  19. 19.
    Jones, S., Walker, S., Robertson, S.: A probabilistic model of information retrieval: development and comparative experiments (parts 1 and 2). Inf. Process. Manag. 36(6), 779–840 (2000) CrossRefGoogle Scholar
  20. 20.
    Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: the k most representative skyline operator. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 86–95. IEEE Press, New York (2007) CrossRefGoogle Scholar
  21. 21.
    Natsev, A., Chang, Y.C., Smith, J.R., Li, C.S., Vitter, J.S.: Supporting incremental join queries on ranked inputs. In: VLDB, pp. 281–290. Morgan Kaufmann, San Mateo (2001) Google Scholar
  22. 22.
    Robertson, S., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at trec-3. In: TREC-3, Gaithersburg, USA, pp. 109–126 (1994) Google Scholar
  23. 23.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983) MATHGoogle Scholar
  24. 24.
    Schnaitter, K., Polyzotis, N.: Evaluating rank joins with optimal cost. In: PODS, pp. 43–52 (2008) CrossRefGoogle Scholar
  25. 25.
    Shalem, M., Kanza, Y.: Computing the top-k maximal answers in a join of ranked lists. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, pp. 1381–1384. ACM, New York (2010) CrossRefGoogle Scholar
  26. 26.
    Shalem, M., Kanza, Y.: How to choose combinations in a join of search results. In: Proceedings of the 20th International Conference Companion on World Wide Web, WWW ’11, pp. 119–120. ACM, New York (2011) CrossRefGoogle Scholar
  27. 27.
    Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Yahia, S.A.: Efficient computation of diverse query results. In: ICDE, pp. 228–236 (2008) Google Scholar
  28. 28.
    Xia, T., Zhang, D., Tao, Y.: On skylining with flexible dominance relation. In: IEEE 24th International Conference on Data Engineering, ICDE 2008, pp. 1397–1399. IEEE Press, New York (2008) CrossRefGoogle Scholar
  29. 29.
    Yiu, M., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 483–494 (2007) Google Scholar
  30. 30.
    Zhai, C., Lafferty, J.: A risk minimization framework for information retrieval. Inf. Process. Manag. 42(1), 31–55 (2006) MATHCrossRefGoogle Scholar
  31. 31.
    Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: SIGIR, pp. 504–511 (2005) Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Computer ScienceTechnionHaifaIsrael

Personalised recommendations