Skip to main content
Log in

A generic framework for efficient computation of top-k diverse results

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a \(24\times \) speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. Diversity between a pair of records is simply \(1- similarity \) between them.

  2. Please note diversity could be easily calculated from similarity bounds.

  3. Diversity between a pair of records is simply \(1- similarity \) between them.

  4. The code and data could be found at https://anonymous.4open.science/r/divGetBatch-54BE/README.md

  5. https://www.yelp.com/dataset/documentation/main

  6. https://archive.ics.uci.edu/ml/datasets/gas+sensor+array+drift+dataset

  7. https://grouplens.org/datasets/movielens/

  8. https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html

References

  1. Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013)

  2. Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013)

  3. Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013)

  4. Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020)

  5. Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009)

  6. Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011)

  7. Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019)

  8. Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997)

  9. Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990)

  10. Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)

    Article  MATH  Google Scholar 

  11. Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996)

  12. Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006)

  13. Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020)

  14. Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998)

  15. Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997)

    Google Scholar 

  16. Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009)

  17. Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009)

    Google Scholar 

  18. Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:1208.3533

  19. Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013)

  20. Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021)

  21. Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012)

  22. Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009)

  23. Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. TCS 38, 293–306 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  24. Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984)

  25. Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011)

    Google Scholar 

  26. Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017)

  27. Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)

    Article  Google Scholar 

  28. Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998)

  29. Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008)

  30. Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018)

  31. Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020)

  32. Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016)

  33. Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022)

  34. Parreño, F., et al.: Measuring diversity. a review and an empirical analysis. EJOR 289(2), 515–532 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  35. Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016)

  36. Qin, L., et al.: Diversifying top-k results (2012). arXiv preprint arXiv:1208.0076

  37. Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017)

  38. Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981)

  39. Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018)

  40. Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018)

  41. Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011)

  42. Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014)

  43. Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018)

  44. Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020)

  45. White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996)

  46. Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018)

    Google Scholar 

  47. Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b)

  48. Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017)

    Google Scholar 

  49. Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009)

  50. Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018)

  51. Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Senjuti Basu Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Islam, M.M., Asadi, M., Amer-Yahia, S. et al. A generic framework for efficient computation of top-k diverse results. The VLDB Journal 32, 737–761 (2023). https://doi.org/10.1007/s00778-022-00770-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-022-00770-0

Keywords

Navigation