Abstract
Nowadays, in information systems such as web search engines and databases, diversity is becoming increasingly essential and getting more and more attention for improving users’ satisfaction. In this sense, query result diversification is of vital importance and well worth researching. Some issues such as the definition of diversification and efficient diverse query processing are more challenging to handle in information systems. Many researchers have focused on various dimensions of diversify problem. In this survey, we aim to provide a thorough review of a wide range of result diversification techniques including various definitions of diversifications, corresponding algorithms, diversification technique specified for some applications including database, search engines, recommendation systems, graphs, time series and data streams as well as result diversification systems. We also propose some open research directions, which are challenging and have not been explored up till now, to improve the quality of query results.
Similar content being viewed by others
References
Agrawal R, Gollapudi S, Halverson A, Ieong S (2009) Diversifying search results. In: Proceedings of the 2nd ACM international conference on web search and data mining. ACM, pp 5–14
Anagnostopoulos A, Broder AZ, Carmel D (2006) Sampling search-engine results. World Wide Web 9(4):397–429
Angel A, Koudas N (2011) Efficient diversity-aware search. In: Proceedings of the 2011 ACM SIGMOD international conference on management of data. ACM, pp 781–792
Berchtold S, Ertl B, Keim DA, Kriegel H-P, Seidl T (1998) Fast nearest neighbor search in high-dimensional space. In: 1998. Proceedings, 14th international conference on data engineering. IEEE, pp 209–218
Borodin A, Lee HC, Ye Y (2012) Max-sum diversification, monotone submodular functions and dynamic updates. In: Proceedings of the 31st symposium on principles of database systems. ACM, pp 155–166
Bouzeghoub M (2004) A framework for analysis of data freshness. In: Proceedings of the 2004 international workshop on information quality in information systems. ACM, pp 59–67
Broder AZ, Charikar M, Frieze AM, Mitzenmacher M (1998) Min-wise independent permutations. In: Proceedings of the 30th annual ACM symposium on theory of computing. ACM, pp 327–336
Cao X, Chen L, Cong G, Jensen CS, Qu Q, Skovsgaard A, Wu D, Yiu ML (2012) Spatial keyword querying. In: Atzeni P , Cheung D , Ram S (eds) Conceptual modeling. Springer, Berlin, pp 16–29
Capannini G, Nardini FM, Perego R, Silvestri F (2011) Efficient diversification of web search results. Proc VLDB Endow 4(7):451–459
Carbonell J, Goldstein J (1998) The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 335–336
Catallo I, Ciceri E, Fraternali P, Martinenghi D, Tagliasacchi M (2013) Top-\(k\) diversity queries over bounded regions. ACM Trans Database Syst (TODS) 38(2):10
Chakraborty T, Modani N, Narayanam R, Nagar S (2015) Discern: a diversified citation recommendation system for scientific queries. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 555–566
Chen L, Cong G (2015) Diversity-aware top-\(k\) publish/subscribe for text stream. In: Proceedings of the 2015 ACM SIGMOD international conference on management of data, Melbourne, May 31–June 4, pp 347–362
Chen Y-Y, Suel T, Markowetz A (2006) Efficient query processing in geographic web search engines. In: Proceedings of the 2006 ACM SIGMOD international conference on management of data. ACM, pp 277–288
Cho J, Garcia-Molina H (2000) Synchronizing a database to improve freshness. In: Cho J, Garcia-Molina H (eds) ACM sigmod record, vol 29(2). ACM, pp 117–128
Clarke CL, Kolla M, Cormack GV, Vechtomova O, Ashkan A, Büttcher S, MacKinnon I (2008) Novelty and diversity in information retrieval evaluation. In: Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 659–666
Demidova E, Fankhauser P, Zhou X, Nejdl W (2010) Divq: diversification for keyword search over structured databases. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 331–338
Deng T, Fan W (2013) On the complexity of query result diversification. ACM Trans Database Syst (TODS) 6(2):577–588
Deshpande M, Karypis G (2004) Item-based top-\(n\) recommendation algorithms. ACM Trans Inf Syst (TOIS) 22(1):143–177
Dou Z, Hu S, Chen K, Song R, Wen J-R (2011) Multi-dimensional search result diversification. In: Proceedings of the 4th ACM international conference on web search and data mining. ACM, pp 475–484
Drosou M, Pitoura E (2009) Diversity over continuous data. IEEE Data Eng Bull 32(4):49–56
Drosou M, Pitoura E (2010) Search result diversification. ACM SIGMOD Rec 39(1):41–47
Drosou M, Pitoura E (2012) Dynamic diversification of continuous data. In: Proceedings of the 15th international conference on extending database technology. ACM, pp 216–227
Drosou M, Pitoura E (2012) Disc diversity: result diversification based on dissimilarity and coverage. Proc VLDB Endow 6(1):13–24
Drosou M, Pitoura E (2013) Poikilo: a tool for evaluating the results of diversification models and algorithms. Proc VLDB Endow 6(12):1246–1249
Drosou M, Pitoura E (2014) Diverse set selection over dynamic data. IEEE Trans Knowl Data Eng 26(5):1102–1116
Drosou M, Stefanidis K, Pitoura E (2009) Preference-aware publish/subscribe delivery with diversity. In: Proceedings of the 3rd ACM international conference on distributed event-based systems. ACM, p 6
Eravci B, Ferhatosmanoglu H (2013) Diversity based relevance feedback for time series search. Proc VLDB Endow 7(2):109–120
Fan W, Wang X, Wu Y (2013) Diversified top-\(k\) graph pattern matching. Proc VLDB Endow 6(13):1510–1521
Geusebroek J, Burghouts GJ, Smeulders AWM (2005) The amsterdam library of object images. Int J Comput Vis 61(1):103–112
Gollapudi S, Panigrahy R (2006) Exploiting asymmetry in hierarchical topic extraction. In: Proceedings of the 15th ACM international conference on information and knowledge management. ACM, pp 475–482
Gollapudi S, Sharma A (2009) An axiomatic approach for result diversification. In: Proceedings of the 18th international conference on world wide web. ACM, pp 381–390
Gonzalez TF (1985) Clustering to minimize the maximum intercluster distance. Theoret Comput Sci 38:293–306
Haritsa JR (2009) The kndn problem: a quest for unity in diversity. IEEE Data Eng Bull 32(4):15–22
Herlocker JL, Konstan JA, Borchers A, Riedl J (1999) An algorithmic framework for performing collaborative filtering. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 230–237
http://ftp.ics.uci.edu/pub/machine-learning-databases/covtype
http://www.dimacs.rutgers.edu/Challenges/Sixth/software.html
Huang X, Cheng H, Li R-H, Qin L, Yu JX (2013) Top-\(k\) structural diversity search in large networks. Proc VLDB Endow 6(13):1618–1629
Hu S, Dou Z, Wang X, Sakai T, Wen J (2015) Search result diversification based on hierarchical intents. In: Proceedings of the 24th ACM international on conference on information and knowledge management, CIKM 2015, Melbourne, Oct 19–23, pp 63–72
Jain A, Sarda P, Haritsa JR (2004) Providing diversity in \(k\)-nearest neighbor query results. In: Dai H, Srikant R, Zhang, C (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 404–413
Jones WP, Furnas GW (1987) Pictures of relevance: a geometric analysis of similarity measures. J Am Soc Inf Sci 38(6):420–442
Kang K-D, Son SH, Stankovic JA, Abdelzaher TF (2002) A qos-sensitive approach for timeliness and freshness guarantees in real-time databases. In: 2002. Proceedings. 14th Euromicro conference on real-time systems. IEEE, pp 203–212
Khan HA, Sharaf MA (2015) Progressive diversification for column-based data exploration platforms. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 327–338
Khan HA, Drosou M, Sharaf MA (2013) Dos: an efficient scheme for the diversification of multiple search results. In: International conference on scientific and statistical database management, pp 1–4
Kraaij W, Pohlmann R, Hiemstra D (2000) Twenty-one at trec-8: using language technology for information retrieval. In: Voorhees E, Harman D (eds) National institute of standards and technology
Labrinidis A, Roussopoulos N (2003) Balancing performance and data freshness in web database servers. In: Proceedings of the 29th international conference on very large data bases vol 29. VLDB Endowment, pp 393–404
Labrinidis A, Roussopoulos N (2004) Exploring the tradeoff between performance and data freshness in database-driven web servers. VLDB J 13(3):240–255
Lafferty J, Zhai C (2003) Probabilistic relevance models based on document and query generation. In: Croft W B, Lafferty J (eds) Language modeling for information retrieval. Springer, Netherlands, pp 1–10
Lee L (1999) Measures of distributional similarity. In: Proceedings of the 37th annual meeting of the association for computational linguistics on computational linguistics. Association for Computational Linguistics, pp 25–32
Li L, Chan C-Y (2013) Efficient indexing for diverse query results. Proc VLDB Endow 6(9):745–756
Li R-H, Yu JX (2013) Scalable diversified ranking on large graphs. IEEE Trans Knowl Data Eng 25(9):2133–2146
Liu Z, Sun P, Chen Y (2009) Structured search result differentiation. Proc VLDB Endow 2(1):313–324
Liu K, Terzi E, Grandison T (2009) Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Datamining (SDM), pp 545–556
Liu Y, Song R, Zhang M, Dou Z, Yamamoto T, Kato MP, Ohshima H, Zhou K (2014) Overview of the NTCIR-11 imine task. In: Proceedings of the 11th NTCIR conference on evaluation of information access technologies, NTCIR-11, National Center of Sciences, Tokyo, Dec 9–12
Martinenghi D, Tagliasacchi M (2010) Proximity rank join. Proc VLDB Endow 3(1–2):352–363
Miller DR, Leek T, Schwartz RM (1999) A hidden markov model information retrieval system. In: Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 214–221
Minack E, Siberski W, Nejdl W (2011) Incremental diversification for very large sets: a streaming-based approach. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 585–594
Nanongkai D, Sarma AD, Lall A, Lipton RJ, Xu J (2010) Regret-minimizing representative databases. Proc VLDB Endow 3(1–2):1114–1124
Ng KW, Tsai FS, Chen L, Goh KC (2007) Novelty detection for text documents using named entity recognition. In: 2007 6th international conference on information, communications & signal processing. IEEE, pp 1–5
Ni J, Ravishankar CV (2007) Pointwise-dense region queries in spatio-temporal databases. In 2007. ICDE 2007. IEEE 23rd international conference on data engineering. IEEE, pp 1066–1075
Ntoutsi E, Stefanidis K, Rausch K, Kriegel H (2014) “Strength lies in differences”: diversifying friends for recommendations through subspace clustering. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 729–738
Ozdemiray AM, Altingovde IS (2014) Query performance prediction for aspect weighting in search result diversification. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, pp 1871–1874
Papastavrou S, Chrysanthis PK, Samaras G (2013) Performance vs. freshness in web database applications. In: World wide web, pp 1–27
Qin L, Yu JX, Chang L (2012) Diversifying top-\(k\) results. Proc VLDB Endow 5(11):1124–1135
Radlinski F, Dumais S (2006) Improving personalized web search using result diversification. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 691–692
Rafiei D, Bharat K, Shukla A (2010) Diversifying web search results. In: Proceedings of the 19th international conference on world wide web. ACM, pp 781–790
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM conference on computer supported cooperative work. ACM, pp 175–186
Robertson SE (1977) The probability ranking principle in IR. J Doc 33(4):294–304
Santos RL, Macdonald C, Ounis I (2010) Exploiting query reformulations for web search result diversification. In: Proceedings of the 19th international conference on world wide web. ACM, pp 881–890
Santos LFD, Oliveira WD, Ferreira MRP, Traina AJM, Traina C (2013) Parameter-free and domain-independent similarity search with diversity. In: International conference on scientific and statistical database management, pp 1–12
Sarwar B, Karypis G, Konstan J, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. In: Proceedings of the 10th international conference on world wide web. ACM, pp 285–295
Schedl M, Hauger D (2015) Tailoring music recommendations to users by considering diversity, mainstreaminess, and novelty. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 947–950
Stefanidis K, Drosou M, Pitoura E (2010) Perk: personalized keyword search in relational databases through preferences. In Proceedings of the 13th international conference on extending database technology. ACM, pp 585–596
Tang J, Sanderson M (2010) Evaluation and user preference study on spatial diversity. In: Gurrin C, He Y, Kazai G, Kruschwitz U, Little S, Roelleke T, Ruger S, Rijsbergen KV (eds) Advances in information retrieval. Springer, Berlin, pp 179–190
Tong H, He J, Wen Z, Konuru R, Lin C-Y (2011) Diversified ranking on large graphs: an optimization viewpoint. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1028–1036
Tsai FS, Chan KL (2010) Redundancy and novelty mining in the business blogosphere. Learn Organ 17(6):490–499
Tsai FS, Kwee AT (2011) Database optimization for novelty mining of business blogs. Expert Syst Appl 38(9):11040–11047
Van Kreveld M, Reinbacher I, Arampatzis A, Van Zwol R (2005) Multi-dimensional scattered ranking methods for geographic information retrieval*. GeoInformatica 9(1):61–84
van Leuken RH, Garcia L, Olivares X, van Zwol R (2009) Visual diversification of image search results. In: Proceedings of the 18th international conference on world wide web. ACM, pp 341–350
Vee E, Srivastava U, Shanmugasundaram J, Bhat P, Yahia SA (2008) Efficient computation of diverse query results. In 2008. ICDE 2008. IEEE 24th international conference on data engineering. IEEE, pp 228–236
Vieira MR, Razente HL, Barioni MCN, Hadjieleftheriou M, Srivastava D, Traina A, Tsotras VJ (2011) On query result diversification. In: 2011 IEEE 27th international conference on data engineering (ICDE). IEEE, pp 1163–1174
Vieira MR, Razente HL, Barioni MC, Hadjieleftheriou M, Srivastava D, Traina C Jr, Tsotras VJ (2011) Divdb: a system for diversifying query results. Proc VLDB Endow 4(12):1395–1398
Wallace DL (1983) Comment. J Am Stat Assoc 78(383):569–576
Wang J, Cheng J, Fu AW-C (2013) Redundancy-aware maximal cliques. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 122–130
Xia L, Xu J, Lan Y, Guo J, Cheng X (2015) Learning maximal marginal relevance model via directly optimizing diversity evaluation measures. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval, Santiago, Aug 9–13, pp 113–122
Xin D, Cheng H, Yan X, Han J (2006) Extracting redundancy-aware top-\(k\) patterns. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 444–453
Yuan L, Qin L, Lin X, Chang L, Zhang W (2015) Diversified top-\(k\) clique search. In: 31st IEEE international conference on data engineering, ICDE 2015, Seoul, Apr 13–17, pp 387–398
Yu H, Ren F (2014) Search result diversification via filling up multiple knapsacks. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, CIKM 2014, Shanghai, Nov 3–7, p 609–618
Yu C, Lakshmanan L, Amer-Yahia S (2009) It takes variety to make a world: diversification in recommender systems. In: Proceedings of the 12th international conference on extending database technology: advances in database technology. ACM, pp 368–378
Zhai C, Lafferty J (2001) Model-based feedback in the language modeling approach to information retrieval. In: Proceedings of the 10th international conference on information and knowledge management. ACM, pp 403–410
Zhang Y, Callan J, Minka T (2002) Novelty and redundancy detection in adaptive filtering. In: Proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval. ACM, pp 81–88
Zhao F, Zhang X, Tung AK, Chen G (2011) Broad: Diversified keyword search in databases. Proc VLDB Endow 4(12):1355–1358
Zhu Y, Yu JX, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: Proceedings of the 21st ACM international conference on information and knowledge management. ACM, pp 205–214
Zhu Y, Lan Y, Guo J, Cheng X, Niu S (2014) Learning for search result diversification. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast, July 06–11, p 293–302
Ziegler C-N, McNee SM, Konstan JA, Lausen G (2005) Improving recommendation lists through topic diversification. In: Proceedings of the 14th international conference on world wide web. ACM, pp 22–32
Acknowledgments
This paper was partially supported by National Sci-Tech Support Plan 2015BAH10F01 and NSFC Grant U1509216, 61472099, 61133002 and the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Province LC2016026.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zheng, K., Wang, H., Qi, Z. et al. A survey of query result diversification. Knowl Inf Syst 51, 1–36 (2017). https://doi.org/10.1007/s10115-016-0990-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0990-4