A generic framework for efficient computation of top-k diverse results

Islam, Md Mouinul; Asadi, Mahsa; Amer-Yahia, Sihem; Roy, Senjuti Basu

doi:10.1007/s00778-022-00770-0

A generic framework for efficient computation of top-k diverse results

Regular Paper
Published: 28 November 2022

Volume 32, pages 737–761, (2023)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Md Mouinul Islam¹,
Mahsa Asadi¹,
Sihem Amer-Yahia² &
…
Senjuti Basu Roy¹

451 Accesses
1 Citation
Explore all metrics

Abstract

Result diversification is extensively studied in the context of search, recommendation, and data exploration. There are numerous algorithms that return top-k results that are both diverse and relevant. These algorithms typically have computational loops that compare the pairwise diversity of records to decide which ones to retain. We propose an access primitive DivGetBatch() that replaces repeated pairwise comparisons of diversity scores of records by pairwise comparisons of “aggregate” diversity scores of a group of records, thereby improving the running time of these algorithms while preserving the same results. We integrate the access primitive inside three representative diversity algorithms and prove that the augmented algorithms leveraging the access primitive preserve original results. We analyze the worst and expected case running times of these algorithms. We propose a computational framework to design this access primitive that has a pre-computed index structure I-tree that is agnostic to the specific details of diversity algorithms. We develop principled solutions to construct and maintain I-tree. Our experiments on multiple large real-world datasets corroborate our theoretical findings, while ensuring up to a \(24\times \) speedup.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Article Open access 03 May 2022

Notes

Diversity between a pair of records is simply \(1- similarity \) between them.
Please note diversity could be easily calculated from similarity bounds.
Diversity between a pair of records is simply \(1- similarity \) between them.
The code and data could be found at https://anonymous.4open.science/r/divGetBatch-54BE/README.md
https://www.yelp.com/dataset/documentation/main
https://archive.ics.uci.edu/ml/datasets/gas+sensor+array+drift+dataset
https://grouplens.org/datasets/movielens/
https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html

References

Abbar, S., et al.: Diverse near neighbor problem. In: SoCG pp. 207–214 (2013)
Abbar, S., et al.: Real-time recommendation of diverse related articles. WWW pp. 1–12 (2013)
Abbassi, Z., et al.: Diversity maximization under matroid constraints. In: SIGKDD (2013)
Agarwal, P.K., et al.: Efficient indexes for diverse top-k range queries. PODS pp. 213–227 (2020)
Agrawal, R., et al.: Diversifying search results. WSDM 5–14 (2009)
Angel, A., Koudas, N.: Efficient diversity-aware search. SIGMOD pp. 781–792 (2011)
Balog, K., et al.: Transparent, scrutable and explainable user models for personalized recommendation. SIGIR (2019)
Bayer, R.: The universal b-tree for multidimensional indexing: General concepts. In: ICWCA, Springer, pp 198–209 (1997)
Beckmann, N., et al.: The r*-tree: An efficient and robust access method for points and rectangles. SIGMOD, pp 322–331 (1990)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MATH Google Scholar
Berchtold, S., et al.: The x-tree: An efficient and robust access method for points and rectangles. VLDB, pp 28–39 (1996)
Beygelzimer. A., et al.: Cover trees for nearest neighbor. ICML (2006)
Cai, Z., et al.: Diversified spatial keyword search on rdf data. The VLDB Journal pp. 1–19 (2020)
Carbonell, J., et al.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. SIGIR pp. 335–336 (1998)
Ciaccia, P., et al.: M-tree: An efficient access method for similarity search in metric spaces. Vldb 97, 426–435 (1997)
Google Scholar
Cormen, T.H., et al.: Introduction to algorithms. MIT press (2009)
Drosou, M., et al.: Diversity over continuous data. IEEE Data Eng Bull 32(4), 49–56 (2009)
Google Scholar
Drosou, M., et al.: Disc diversity: result diversification based on dissimilarity and coverage (2012). arXiv preprint arXiv:1208.3533
Drosou, M., et al.: Diverse set selection over dynamic data. TKDE 26 (2013)
Esfandiari, M., et al.: Multi-session diversity to improve user satisfaction in web applications. TWC, pp 1928–1936 (2021)
Fraternali, P., et al.: Top-k bounded diversification. SIGMOD, pp 421–432 (2012)
Gollapudi, S., et al.: An axiomatic approach for result diversification. WWW pp. 381–390 (2009)
Gonzalez, T.F.: Clustering to minimize the maximum intercluster distance. TCS 38, 293–306 (1985)
Article MathSciNet MATH Google Scholar
Guttman, A.: R-trees: A dynamic index structure for spatial searching, ACM 14(2), (1984)
Han, J., et al.: Data mining concepts and techniques third edition. Morgan Kaufmann Series 5(4), 83–124 (2011)
Google Scholar
Hope, T., et al.: Accelerating innovation through analogy mining. SIGKDD (2017)
Katayama, N., et al.: The sr-tree: An index structure for high-dimensional nearest neighbor queries. Sigmod Record 26(2), 369–380 (1997)
Article Google Scholar
Knuth, D.E.: The Art of Computer Programming, Fundamental Algorithms, vol 1, 3rd edn. Addison Wesley Longman Publishing Co., Inc., (book) (1998)
Kumar, N., et al.: What is a good nearest neighbors algorithm for finding similar patches in images? In: European conference on computer vision, Springer, pp 364–378 (2008)
Mafrur, R., et al.: Dive: diversifying view recommendation for visual data exploration. CIKM pp. 1123–1132 (2018)
Maropaki, S., et al.: Diversifying top-k point-of-interest queries via collective social reach. In: CIKM pp. 2149–2152 (2020)
Mouratidis, K.: Geometric aspects and auxiliary features to top-k processing. In: MDM (2016)
Nikookar, S., et al.: Diversifying recommendations on sequences of sets. VLDB Journal (2022)
Parreño, F., et al.: Measuring diversity. a review and an empirical analysis. EJOR 289(2), 515–532 (2021)
Article MathSciNet MATH Google Scholar
Puthiya Parambath, S.A., et al.: A coverage-based approach to recommendation diversity on similarity graph. In: RecSys pp. 15–22 (2016)
Qin, L., et al.: Diversifying top-k results (2012). arXiv preprint arXiv:1208.0076
Ren, P., et al.: Leveraging contextual sentence relations for extractive summarization using a neural attention model. SIGIR, pp 95–104 (2017)
Robinson, J.T.: The kdb-tree: a search structure for large multidimensional dynamic indexes. SIGMOD, pp 10–18 (1981)
Singh, A., et al.: Fairness of exposure in rankings. In: SIGKDD pp. 2219–2228 (2018)
Tsai, C.H., et al.: Beyond the ranked list: User-driven exploration and diversification of social recommendation. In: 23rd ICIUI pp 239–250 (2018)
Vargas, S., et al.: Rank and relevance in novelty and diversity metrics for recommender systems. RecSys (2011)
Vargas, S., et al.: Coverage, redundancy and size-awareness in genre diversity for recommender systems. RecSys pp. 209–216 (2014)
Wang, D., et al.: Sequence-based context-aware music recommendation. Information Retrieval Journal pp. 230–252 (2018)
Wang, L., et al.: Diversified and scalable service recommendation with accuracy guarantee. IEEE TCSS (2020)
White, D.A., et al.: Similarity indexing with the ss-tree. In: ICDE pp. 516–523 (1996)
Wu, W., et al.: Personalizing recommendation diversity based on user personality. UMUAI 28(3), 237–276 (2018)
Google Scholar
Wu, Y., et al.: Beyond greedy search: pruned exhaustive search for diversified result ranking. SIGIR, pp 99–106 (2018b)
Jg, Y., et al.: Recent advances in document summarization. KIS 53(2), 297–336 (2017)
Google Scholar
Yu, C., et al.: It takes variety to make a world: diversification in recommender systems. EDBT pp. 368–378 (2009)
Zanitti, M., et al.: A user-centric diversity by design recommender system for the movie application domain. In: Companion Proceedings of WWW, pp 1381–1389 (2018)
Zehlike, M., et al.: Fa* ir: A fair top-k ranking algorithm. In: CIKM pp. 1569–1578 (2017)

Download references

Author information

Authors and Affiliations

New Jersey Institute of Technology, New Jersey, USA
Md Mouinul Islam, Mahsa Asadi & Senjuti Basu Roy
CNRS, Universite Grenoble Alpes, Grenoble, France
Sihem Amer-Yahia

Authors

Md Mouinul Islam
View author publications
You can also search for this author in PubMed Google Scholar
Mahsa Asadi
View author publications
You can also search for this author in PubMed Google Scholar
Sihem Amer-Yahia
View author publications
You can also search for this author in PubMed Google Scholar
Senjuti Basu Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Senjuti Basu Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Islam, M.M., Asadi, M., Amer-Yahia, S. et al. A generic framework for efficient computation of top-k diverse results. The VLDB Journal 32, 737–761 (2023). https://doi.org/10.1007/s00778-022-00770-0

Download citation

Received: 18 January 2022
Revised: 25 October 2022
Accepted: 31 October 2022
Published: 28 November 2022
Issue Date: July 2023
DOI: https://doi.org/10.1007/s00778-022-00770-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generic framework for efficient computation of top-k diverse results

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A generic framework for efficient computation of top-k diverse results

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Recommender Systems: Techniques, Applications, and Challenges

A systematic review and research perspective on recommender systems

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation