Advertisement

Knowledge and Information Systems

, Volume 51, Issue 1, pp 1–36 | Cite as

A survey of query result diversification

  • Kaiping Zheng
  • Hongzhi WangEmail author
  • Zhixin Qi
  • Jianzhong Li
  • Hong Gao
Survey Paper

Abstract

Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users’ satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.

Keywords

Diversity Query processing Information retrieval 

Notes

Acknowledgments

This paper was partially supported by National Sci-Tech Support Plan 2015BAH10F01 and NSFC Grant U1509216, 61472099, 61133002 and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province LC2016026.

References

  1. 1.
    Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 5–14Google Scholar
  2. 2.
    Anagnostopoulos A, Broder AZ, Carmel D (2006) Sampling search-engine results. World Wide Web 9(4):397–429CrossRefGoogle Scholar
  3. 3.
    Angel A, Koudas N (2011) Efficient diversity-aware search. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 781–792Google Scholar
  4. 4.
    Berchtold S, Ertl B, Keim DA, Kriegel H-P, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: 1998. Proceedings, 14th international conference on data engineering. IEEE, pp 209–218Google Scholar
  5. 5.
    Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, pp 155–166Google Scholar
  6. 6.
    Bouzeghoub M (2004) A framework for analysis of data freshness. In: Proceedings of the 2004 international workshop on information quality in information systems. ACM, pp 59–67Google Scholar
  7. 7.
    Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 327–336Google Scholar
  8. 8.
    Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: Atzeni P , Cheung D , Ram S (eds) Conceptual modeling. Springer, Berlin, pp 16–29Google Scholar
  9. 9.
    Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results. Proc VLDB Endow 4(7):451–459CrossRefGoogle Scholar
  10. 10.
    Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336Google Scholar
  11. 11.
    Catallo I, Ciceri E, Fraternali P, Martinenghi D, Tagliasacchi M (2013) Top-\(k\) diversity queries over bounded regions. ACM Trans Database Syst (TODS) 38(2):10MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Chakraborty T, Modani N, Narayanam R, Nagar S (2015) Discern: a diversified citation recommendation system for scientific queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 555–566Google Scholar
  13. 13.
    Chen L, Cong G (2015) Diversity-aware top-\(k\) publish/subscribe for text stream. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, May 31–June 4, pp 347–362Google Scholar
  14. 14.
    Chen Y-Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data. ACM, pp 277–288Google Scholar
  15. 15.
    Cho J, Garcia-Molina H (2000) Synchronizing a database to improve freshness. In: Cho J, Garcia-Molina H (eds) ACM sigmod record, vol 29(2). ACM, pp 117–128Google Scholar
  16. 16.
    Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 659–666Google Scholar
  17. 17.
    Demidova E, Fankhauser P, Zhou X, Nejdl W (2010) Divq: diversification for keyword search over structured databases. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 331–338Google Scholar
  18. 18.
    Deng T, Fan W (2013) On the complexity of query result diversification. ACM Trans Database Syst (TODS) 6(2):577–588MathSciNetGoogle Scholar
  19. 19.
    Deshpande M, Karypis G (2004) Item-based top-\(n\) recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177CrossRefGoogle Scholar
  20. 20.
    Dou Z, Hu S, Chen K, Song R, Wen J-R (2011) Multi-dimensional search result diversification. In: Proceedings of the 4th ACM international conference on web search and data mining. ACM, pp 475–484Google Scholar
  21. 21.
    Drosou M, Pitoura E (2009) Diversity over continuous data. IEEE Data Eng Bull 32(4):49–56Google Scholar
  22. 22.
    Drosou M, Pitoura E (2010) Search result diversification. ACM SIGMOD Rec 39(1):41–47CrossRefGoogle Scholar
  23. 23.
    Drosou M, Pitoura E (2012) Dynamic diversification of continuous data. In: Proceedings of the 15th international conference on extending database technology. ACM, pp 216–227Google Scholar
  24. 24.
    Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24CrossRefGoogle Scholar
  25. 25.
    Drosou M, Pitoura E (2013) Poikilo: a tool for evaluating the results of diversification models and algorithms. Proc VLDB Endow 6(12):1246–1249CrossRefGoogle Scholar
  26. 26.
    Drosou M, Pitoura E (2014) Diverse set selection over dynamic data. IEEE Trans Knowl Data Eng 26(5):1102–1116CrossRefGoogle Scholar
  27. 27.
    Drosou M, Stefanidis K, Pitoura E (2009) Preference-aware publish/subscribe delivery with diversity. In: Proceedings of the 3rd ACM international conference on distributed event-based systems. ACM, p 6Google Scholar
  28. 28.
    Eravci B, Ferhatosmanoglu H (2013) Diversity based relevance feedback for time series search. Proc VLDB Endow 7(2):109–120CrossRefGoogle Scholar
  29. 29.
    Fan W, Wang X, Wu Y (2013) Diversified top-\(k\) graph pattern matching. Proc VLDB Endow 6(13):1510–1521CrossRefGoogle Scholar
  30. 30.
    Geusebroek J, Burghouts GJ, Smeulders AWM (2005) The amsterdam library of object images. Int J Comput Vis 61(1):103–112CrossRefGoogle Scholar
  31. 31.
    Gollapudi S, Panigrahy R (2006) Exploiting asymmetry in hierarchical topic extraction. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 475–482Google Scholar
  32. 32.
    Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on world wide web. ACM, pp 381–390Google Scholar
  33. 33.
    Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret Comput Sci 38:293–306MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Haritsa JR (2009) The kndn problem: a quest for unity in diversity. IEEE Data Eng Bull 32(4):15–22Google Scholar
  35. 35.
    Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 230–237Google Scholar
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.
  41. 41.
  42. 42.
  43. 43.
  44. 44.
  45. 45.
  46. 46.
  47. 47.
  48. 48.
  49. 49.
  50. 50.
  51. 51.
  52. 52.
    Huang X, Cheng H, Li R-H, Qin L, Yu JX (2013) Top-\(k\) structural diversity search in large networks. Proc VLDB Endow 6(13):1618–1629CrossRefGoogle Scholar
  53. 53.
    Hu S, Dou Z, Wang X, Sakai T, Wen J (2015) Search result diversification based on hierarchical intents. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM 2015, Melbourne, Oct 19–23, pp 63–72Google Scholar
  54. 54.
    Jain A, Sarda P, Haritsa JR (2004) Providing diversity in \(k\)-nearest neighbor query results. In: Dai H, Srikant R, Zhang, C (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 404–413Google Scholar
  55. 55.
    Jones WP, Furnas GW (1987) Pictures of relevance: a geometric analysis of similarity measures. J Am Soc Inf Sci 38(6):420–442CrossRefGoogle Scholar
  56. 56.
    Kang K-D, Son SH, Stankovic JA, Abdelzaher TF (2002) A qos-sensitive approach for timeliness and freshness guarantees in real-time databases. In: 2002. Proceedings. 14th Euromicro conference on real-time systems. IEEE, pp 203–212Google Scholar
  57. 57.
    Khan HA, Sharaf MA (2015) Progressive diversification for column-based data exploration platforms. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 327–338Google Scholar
  58. 58.
    Khan HA, Drosou M, Sharaf MA (2013) Dos: an efficient scheme for the diversification of multiple search results. In: International conference on scientific and statistical database management, pp 1–4Google Scholar
  59. 59.
    Kraaij W, Pohlmann R, Hiemstra D (2000) Twenty-one at trec-8: using language technology for information retrieval. In: Voorhees E, Harman D (eds) National institute of standards and technologyGoogle Scholar
  60. 60.
    Labrinidis A, Roussopoulos N (2003) Balancing performance and data freshness in web database servers. In: Proceedings of the 29th international conference on very large data bases vol 29. VLDB Endowment, pp 393–404Google Scholar
  61. 61.
    Labrinidis A, Roussopoulos N (2004) Exploring the tradeoff between performance and data freshness in database-driven web servers. VLDB J 13(3):240–255CrossRefGoogle Scholar
  62. 62.
    Lafferty J, Zhai C (2003) Probabilistic relevance models based on document and query generation. In: Croft W B, Lafferty J (eds) Language modeling for information retrieval. Springer, Netherlands, pp 1–10Google Scholar
  63. 63.
    Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 25–32Google Scholar
  64. 64.
    Li L, Chan C-Y (2013) Efficient indexing for diverse query results. Proc VLDB Endow 6(9):745–756CrossRefGoogle Scholar
  65. 65.
    Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146CrossRefGoogle Scholar
  66. 66.
    Liu Z, Sun P, Chen Y (2009) Structured search result differentiation. Proc VLDB Endow 2(1):313–324CrossRefGoogle Scholar
  67. 67.
    Liu K, Terzi E, Grandison T (2009) Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Datamining (SDM), pp 545–556Google Scholar
  68. 68.
    Liu Y, Song R, Zhang M, Dou Z, Yamamoto T, Kato MP, Ohshima H, Zhou K (2014) Overview of the NTCIR-11 imine task. In: Proceedings of the 11th NTCIR conference on evaluation of information access technologies, NTCIR-11, National Center of Sciences, Tokyo, Dec 9–12Google Scholar
  69. 69.
    Martinenghi D, Tagliasacchi M (2010) Proximity rank join. Proc VLDB Endow 3(1–2):352–363CrossRefGoogle Scholar
  70. 70.
    Miller DR, Leek T, Schwartz RM (1999) A hidden markov model information retrieval system. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 214–221Google Scholar
  71. 71.
    Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 585–594Google Scholar
  72. 72.
    Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. Proc VLDB Endow 3(1–2):1114–1124CrossRefGoogle Scholar
  73. 73.
    Ng KW, Tsai FS, Chen L, Goh KC (2007) Novelty detection for text documents using named entity recognition. In: 2007 6th international conference on information, communications & signal processing. IEEE, pp 1–5Google Scholar
  74. 74.
    Ni J, Ravishankar CV (2007) Pointwise-dense region queries in spatio-temporal databases. In 2007. ICDE 2007. IEEE 23rd international conference on data engineering. IEEE, pp 1066–1075Google Scholar
  75. 75.
    Ntoutsi E, Stefanidis K, Rausch K, Kriegel H (2014) “Strength lies in differences”: diversifying friends for recommendations through subspace clustering. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 729–738Google Scholar
  76. 76.
    Ozdemiray AM, Altingovde IS (2014) Query performance prediction for aspect weighting in search result diversification. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 1871–1874Google Scholar
  77. 77.
    Papastavrou S, Chrysanthis PK, Samaras G (2013) Performance vs. freshness in web database applications. In: World wide web, pp 1–27Google Scholar
  78. 78.
    Qin L, Yu JX, Chang L (2012) Diversifying top-\(k\) results. Proc VLDB Endow 5(11):1124–1135CrossRefGoogle Scholar
  79. 79.
    Radlinski F, Dumais S (2006) Improving personalized web search using result diversification. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 691–692Google Scholar
  80. 80.
    Rafiei D, Bharat K, Shukla A (2010) Diversifying web search results. In: Proceedings of the 19th international conference on world wide web. ACM, pp 781–790Google Scholar
  81. 81.
    Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work. ACM, pp 175–186Google Scholar
  82. 82.
    Robertson SE (1977) The probability ranking principle in IR. J Doc 33(4):294–304CrossRefGoogle Scholar
  83. 83.
    Santos RL, Macdonald C, Ounis I (2010) Exploiting query reformulations for web search result diversification. In: Proceedings of the 19th international conference on world wide web. ACM, pp 881–890Google Scholar
  84. 84.
    Santos LFD, Oliveira WD, Ferreira MRP, Traina AJM, Traina C (2013) Parameter-free and domain-independent similarity search with diversity. In: International conference on scientific and statistical database management, pp 1–12Google Scholar
  85. 85.
    Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web. ACM, pp 285–295Google Scholar
  86. 86.
    Schedl M, Hauger D (2015) Tailoring music recommendations to users by considering diversity, mainstreaminess, and novelty. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 947–950Google Scholar
  87. 87.
    Stefanidis K, Drosou M, Pitoura E (2010) Perk: personalized keyword search in relational databases through preferences. In Proceedings of the 13th international conference on extending database technology. ACM, pp 585–596Google Scholar
  88. 88.
    Tang J, Sanderson M (2010) Evaluation and user preference study on spatial diversity. In: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S, Rijsbergen KV (eds) Advances in information retrieval. Springer, Berlin, pp 179–190Google Scholar
  89. 89.
    Tong H, He J, Wen Z, Konuru R, Lin C-Y (2011) Diversified ranking on large graphs: an optimization viewpoint. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1028–1036Google Scholar
  90. 90.
    Tsai FS, Chan KL (2010) Redundancy and novelty mining in the business blogosphere. Learn Organ 17(6):490–499CrossRefGoogle Scholar
  91. 91.
    Tsai FS, Kwee AT (2011) Database optimization for novelty mining of business blogs. Expert Syst Appl 38(9):11040–11047CrossRefGoogle Scholar
  92. 92.
    Van Kreveld M, Reinbacher I, Arampatzis A, Van Zwol R (2005) Multi-dimensional scattered ranking methods for geographic information retrieval*. GeoInformatica 9(1):61–84CrossRefGoogle Scholar
  93. 93.
    van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: Proceedings of the 18th international conference on world wide web. ACM, pp 341–350Google Scholar
  94. 94.
    Vee E, Srivastava U, Shanmugasundaram J, Bhat P, Yahia SA (2008) Efficient computation of diverse query results. In 2008. ICDE 2008. IEEE 24th international conference on data engineering. IEEE, pp 228–236Google Scholar
  95. 95.
    Vieira MR, Razente HL, Barioni MCN, Hadjieleftheriou M, Srivastava D, Traina A, Tsotras VJ (2011) On query result diversification. In: 2011 IEEE 27th international conference on data engineering (ICDE). IEEE, pp 1163–1174Google Scholar
  96. 96.
    Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C Jr, Tsotras VJ (2011) Divdb: a system for diversifying query results. Proc VLDB Endow 4(12):1395–1398Google Scholar
  97. 97.
    Wallace DL (1983) Comment. J Am Stat Assoc 78(383):569–576Google Scholar
  98. 98.
    Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 122–130Google Scholar
  99. 99.
  100. 100.
    Xia L, Xu J, Lan Y, Guo J, Cheng X (2015) Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 113–122Google Scholar
  101. 101.
    Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-\(k\) patterns. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 444–453Google Scholar
  102. 102.
    Yuan L, Qin L, Lin X, Chang L, Zhang W (2015) Diversified top-\(k\) clique search. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 387–398Google Scholar
  103. 103.
    Yu H, Ren F (2014) Search result diversification via filling up multiple knapsacks. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, p 609–618Google Scholar
  104. 104.
    Yu C, Lakshmanan L, Amer-Yahia S (2009) It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 368–378Google Scholar
  105. 105.
    Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th international conference on information and knowledge management. ACM, pp 403–410Google Scholar
  106. 106.
    Zhang Y, Callan J, Minka T (2002) Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 81–88Google Scholar
  107. 107.
    Zhao F, Zhang X, Tung AK, Chen G (2011) Broad: Diversified keyword search in databases. Proc VLDB Endow 4(12):1355–1358Google Scholar
  108. 108.
    Zhu Y, Yu JX, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 205–214Google Scholar
  109. 109.
    Zhu Y, Lan Y, Guo J, Cheng X, Niu S (2014) Learning for search result diversification. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast, July 06–11, p 293–302Google Scholar
  110. 110.
    Ziegler C-N, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on world wide web. ACM, pp 22–32Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Kaiping Zheng
    • 1
  • Hongzhi Wang
    • 1
    Email author
  • Zhixin Qi
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Harbin Institute of TechnologyHarbinChina

Personalised recommendations