Skip to main content
Log in

A survey of query result diversification

  • Survey Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users’ satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 5–14

  2. Anagnostopoulos A, Broder AZ, Carmel D (2006) Sampling search-engine results. World Wide Web 9(4):397–429

    Article  Google Scholar 

  3. Angel A, Koudas N (2011) Efficient diversity-aware search. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 781–792

  4. Berchtold S, Ertl B, Keim DA, Kriegel H-P, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: 1998. Proceedings, 14th international conference on data engineering. IEEE, pp 209–218

  5. Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, pp 155–166

  6. Bouzeghoub M (2004) A framework for analysis of data freshness. In: Proceedings of the 2004 international workshop on information quality in information systems. ACM, pp 59–67

  7. Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 327–336

  8. Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: Atzeni P , Cheung D , Ram S (eds) Conceptual modeling. Springer, Berlin, pp 16–29

  9. Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results. Proc VLDB Endow 4(7):451–459

    Article  Google Scholar 

  10. Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336

  11. Catallo I, Ciceri E, Fraternali P, Martinenghi D, Tagliasacchi M (2013) Top-\(k\) diversity queries over bounded regions. ACM Trans Database Syst (TODS) 38(2):10

    Article  MathSciNet  MATH  Google Scholar 

  12. Chakraborty T, Modani N, Narayanam R, Nagar S (2015) Discern: a diversified citation recommendation system for scientific queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 555–566

  13. Chen L, Cong G (2015) Diversity-aware top-\(k\) publish/subscribe for text stream. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, May 31–June 4, pp 347–362

  14. Chen Y-Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data. ACM, pp 277–288

  15. Cho J, Garcia-Molina H (2000) Synchronizing a database to improve freshness. In: Cho J, Garcia-Molina H (eds) ACM sigmod record, vol 29(2). ACM, pp 117–128

  16. Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 659–666

  17. Demidova E, Fankhauser P, Zhou X, Nejdl W (2010) Divq: diversification for keyword search over structured databases. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 331–338

  18. Deng T, Fan W (2013) On the complexity of query result diversification. ACM Trans Database Syst (TODS) 6(2):577–588

    MathSciNet  Google Scholar 

  19. Deshpande M, Karypis G (2004) Item-based top-\(n\) recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177

    Article  Google Scholar 

  20. Dou Z, Hu S, Chen K, Song R, Wen J-R (2011) Multi-dimensional search result diversification. In: Proceedings of the 4th ACM international conference on web search and data mining. ACM, pp 475–484

  21. Drosou M, Pitoura E (2009) Diversity over continuous data. IEEE Data Eng Bull 32(4):49–56

    Google Scholar 

  22. Drosou M, Pitoura E (2010) Search result diversification. ACM SIGMOD Rec 39(1):41–47

    Article  Google Scholar 

  23. Drosou M, Pitoura E (2012) Dynamic diversification of continuous data. In: Proceedings of the 15th international conference on extending database technology. ACM, pp 216–227

  24. Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24

    Article  Google Scholar 

  25. Drosou M, Pitoura E (2013) Poikilo: a tool for evaluating the results of diversification models and algorithms. Proc VLDB Endow 6(12):1246–1249

    Article  Google Scholar 

  26. Drosou M, Pitoura E (2014) Diverse set selection over dynamic data. IEEE Trans Knowl Data Eng 26(5):1102–1116

    Article  Google Scholar 

  27. Drosou M, Stefanidis K, Pitoura E (2009) Preference-aware publish/subscribe delivery with diversity. In: Proceedings of the 3rd ACM international conference on distributed event-based systems. ACM, p 6

  28. Eravci B, Ferhatosmanoglu H (2013) Diversity based relevance feedback for time series search. Proc VLDB Endow 7(2):109–120

    Article  Google Scholar 

  29. Fan W, Wang X, Wu Y (2013) Diversified top-\(k\) graph pattern matching. Proc VLDB Endow 6(13):1510–1521

    Article  Google Scholar 

  30. Geusebroek J, Burghouts GJ, Smeulders AWM (2005) The amsterdam library of object images. Int J Comput Vis 61(1):103–112

    Article  Google Scholar 

  31. Gollapudi S, Panigrahy R (2006) Exploiting asymmetry in hierarchical topic extraction. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 475–482

  32. Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on world wide web. ACM, pp 381–390

  33. Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret Comput Sci 38:293–306

    Article  MathSciNet  MATH  Google Scholar 

  34. Haritsa JR (2009) The kndn problem: a quest for unity in diversity. IEEE Data Eng Bull 32(4):15–22

    Google Scholar 

  35. Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 230–237

  36. http://acme.com/digicams

  37. http://boston.lti.cs.cmu.edu/Data/clueweb09/

  38. http://del.icio.us/

  39. http://en.wikipedia.org/wiki/Disambiguation_page

  40. http://ftp.ics.uci.edu/pub/machine-learning-databases/covtype

  41. http://had.co.nz/data/movies

  42. http://iam.unibe.ch/pub/Images/FaceImages

  43. http://kdd.ics.uci.edu

  44. http://movies.yahoo.com/

  45. http://trec.nist.gov

  46. http://www.bookcrossing.com

  47. http://www.csie.ntu.edu.tw/~cjlin/liblinear/

  48. http://www.dimacs.rutgers.edu/Challenges/Sixth/software.html

  49. http://www.informatik.uni-trier.de/~ley/db

  50. http://www.informedia.cs.cmu.edu

  51. http://www.rtreeportal.org

  52. Huang X, Cheng H, Li R-H, Qin L, Yu JX (2013) Top-\(k\) structural diversity search in large networks. Proc VLDB Endow 6(13):1618–1629

    Article  Google Scholar 

  53. Hu S, Dou Z, Wang X, Sakai T, Wen J (2015) Search result diversification based on hierarchical intents. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM 2015, Melbourne, Oct 19–23, pp 63–72

  54. Jain A, Sarda P, Haritsa JR (2004) Providing diversity in \(k\)-nearest neighbor query results. In: Dai H, Srikant R, Zhang, C (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 404–413

  55. Jones WP, Furnas GW (1987) Pictures of relevance: a geometric analysis of similarity measures. J Am Soc Inf Sci 38(6):420–442

    Article  Google Scholar 

  56. Kang K-D, Son SH, Stankovic JA, Abdelzaher TF (2002) A qos-sensitive approach for timeliness and freshness guarantees in real-time databases. In: 2002. Proceedings. 14th Euromicro conference on real-time systems. IEEE, pp 203–212

  57. Khan HA, Sharaf MA (2015) Progressive diversification for column-based data exploration platforms. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 327–338

  58. Khan HA, Drosou M, Sharaf MA (2013) Dos: an efficient scheme for the diversification of multiple search results. In: International conference on scientific and statistical database management, pp 1–4

  59. Kraaij W, Pohlmann R, Hiemstra D (2000) Twenty-one at trec-8: using language technology for information retrieval. In: Voorhees E, Harman D (eds) National institute of standards and technology

  60. Labrinidis A, Roussopoulos N (2003) Balancing performance and data freshness in web database servers. In: Proceedings of the 29th international conference on very large data bases vol 29. VLDB Endowment, pp 393–404

  61. Labrinidis A, Roussopoulos N (2004) Exploring the tradeoff between performance and data freshness in database-driven web servers. VLDB J 13(3):240–255

    Article  Google Scholar 

  62. Lafferty J, Zhai C (2003) Probabilistic relevance models based on document and query generation. In: Croft W B, Lafferty J (eds) Language modeling for information retrieval. Springer, Netherlands, pp 1–10

  63. Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 25–32

  64. Li L, Chan C-Y (2013) Efficient indexing for diverse query results. Proc VLDB Endow 6(9):745–756

    Article  Google Scholar 

  65. Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146

    Article  Google Scholar 

  66. Liu Z, Sun P, Chen Y (2009) Structured search result differentiation. Proc VLDB Endow 2(1):313–324

    Article  Google Scholar 

  67. Liu K, Terzi E, Grandison T (2009) Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Datamining (SDM), pp 545–556

  68. Liu Y, Song R, Zhang M, Dou Z, Yamamoto T, Kato MP, Ohshima H, Zhou K (2014) Overview of the NTCIR-11 imine task. In: Proceedings of the 11th NTCIR conference on evaluation of information access technologies, NTCIR-11, National Center of Sciences, Tokyo, Dec 9–12

  69. Martinenghi D, Tagliasacchi M (2010) Proximity rank join. Proc VLDB Endow 3(1–2):352–363

    Article  Google Scholar 

  70. Miller DR, Leek T, Schwartz RM (1999) A hidden markov model information retrieval system. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 214–221

  71. Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 585–594

  72. Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. Proc VLDB Endow 3(1–2):1114–1124

    Article  Google Scholar 

  73. Ng KW, Tsai FS, Chen L, Goh KC (2007) Novelty detection for text documents using named entity recognition. In: 2007 6th international conference on information, communications & signal processing. IEEE, pp 1–5

  74. Ni J, Ravishankar CV (2007) Pointwise-dense region queries in spatio-temporal databases. In 2007. ICDE 2007. IEEE 23rd international conference on data engineering. IEEE, pp 1066–1075

  75. Ntoutsi E, Stefanidis K, Rausch K, Kriegel H (2014) “Strength lies in differences”: diversifying friends for recommendations through subspace clustering. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 729–738

  76. Ozdemiray AM, Altingovde IS (2014) Query performance prediction for aspect weighting in search result diversification. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 1871–1874

  77. Papastavrou S, Chrysanthis PK, Samaras G (2013) Performance vs. freshness in web database applications. In: World wide web, pp 1–27

  78. Qin L, Yu JX, Chang L (2012) Diversifying top-\(k\) results. Proc VLDB Endow 5(11):1124–1135

    Article  Google Scholar 

  79. Radlinski F, Dumais S (2006) Improving personalized web search using result diversification. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 691–692

  80. Rafiei D, Bharat K, Shukla A (2010) Diversifying web search results. In: Proceedings of the 19th international conference on world wide web. ACM, pp 781–790

  81. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work. ACM, pp 175–186

  82. Robertson SE (1977) The probability ranking principle in IR. J Doc 33(4):294–304

    Article  Google Scholar 

  83. Santos RL, Macdonald C, Ounis I (2010) Exploiting query reformulations for web search result diversification. In: Proceedings of the 19th international conference on world wide web. ACM, pp 881–890

  84. Santos LFD, Oliveira WD, Ferreira MRP, Traina AJM, Traina C (2013) Parameter-free and domain-independent similarity search with diversity. In: International conference on scientific and statistical database management, pp 1–12

  85. Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web. ACM, pp 285–295

  86. Schedl M, Hauger D (2015) Tailoring music recommendations to users by considering diversity, mainstreaminess, and novelty. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 947–950

  87. Stefanidis K, Drosou M, Pitoura E (2010) Perk: personalized keyword search in relational databases through preferences. In Proceedings of the 13th international conference on extending database technology. ACM, pp 585–596

  88. Tang J, Sanderson M (2010) Evaluation and user preference study on spatial diversity. In: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S, Rijsbergen KV (eds) Advances in information retrieval. Springer, Berlin, pp 179–190

  89. Tong H, He J, Wen Z, Konuru R, Lin C-Y (2011) Diversified ranking on large graphs: an optimization viewpoint. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1028–1036

  90. Tsai FS, Chan KL (2010) Redundancy and novelty mining in the business blogosphere. Learn Organ 17(6):490–499

    Article  Google Scholar 

  91. Tsai FS, Kwee AT (2011) Database optimization for novelty mining of business blogs. Expert Syst Appl 38(9):11040–11047

    Article  Google Scholar 

  92. Van Kreveld M, Reinbacher I, Arampatzis A, Van Zwol R (2005) Multi-dimensional scattered ranking methods for geographic information retrieval*. GeoInformatica 9(1):61–84

    Article  Google Scholar 

  93. van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: Proceedings of the 18th international conference on world wide web. ACM, pp 341–350

  94. Vee E, Srivastava U, Shanmugasundaram J, Bhat P, Yahia SA (2008) Efficient computation of diverse query results. In 2008. ICDE 2008. IEEE 24th international conference on data engineering. IEEE, pp 228–236

  95. Vieira MR, Razente HL, Barioni MCN, Hadjieleftheriou M, Srivastava D, Traina A, Tsotras VJ (2011) On query result diversification. In: 2011 IEEE 27th international conference on data engineering (ICDE). IEEE, pp 1163–1174

  96. Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C Jr, Tsotras VJ (2011) Divdb: a system for diversifying query results. Proc VLDB Endow 4(12):1395–1398

    Google Scholar 

  97. Wallace DL (1983) Comment. J Am Stat Assoc 78(383):569–576

    Google Scholar 

  98. Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 122–130

  99. www.nlpir.nist.gov/projects/duc/guidelines/2002.html

  100. Xia L, Xu J, Lan Y, Guo J, Cheng X (2015) Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 113–122

  101. Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-\(k\) patterns. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 444–453

  102. Yuan L, Qin L, Lin X, Chang L, Zhang W (2015) Diversified top-\(k\) clique search. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 387–398

  103. Yu H, Ren F (2014) Search result diversification via filling up multiple knapsacks. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, p 609–618

  104. Yu C, Lakshmanan L, Amer-Yahia S (2009) It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 368–378

  105. Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th international conference on information and knowledge management. ACM, pp 403–410

  106. Zhang Y, Callan J, Minka T (2002) Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 81–88

  107. Zhao F, Zhang X, Tung AK, Chen G (2011) Broad: Diversified keyword search in databases. Proc VLDB Endow 4(12):1355–1358

    Google Scholar 

  108. Zhu Y, Yu JX, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 205–214

  109. Zhu Y, Lan Y, Guo J, Cheng X, Niu S (2014) Learning for search result diversification. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast, July 06–11, p 293–302

  110. Ziegler C-N, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on world wide web. ACM, pp 22–32

Download references

Acknowledgments

This paper was partially supported by National Sci-Tech Support Plan 2015BAH10F01 and NSFC Grant U1509216, 61472099, 61133002 and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province LC2016026.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongzhi Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, K., Wang, H., Qi, Z. et al. A survey of query result diversification. Knowl Inf Syst 51, 1–36 (2017). https://doi.org/10.1007/s10115-016-0990-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0990-4

Keywords

Navigation