Abstract
In this chapter we describe a general framework, called DivDB, for evaluation and optimization of methods for diversifying query results. In these methods, an initial ranking candidate set produced by a query is used to construct a result set. The elements in the result set are ranked with respect to relevance and diversity features, i.e., the retrieved elements should be as relevant as possible to the query, and, at the same time, the result set should be as diverse as possible. While addressing relevance is relatively simple and has been heavily studied, diversity is a harder problem to solve. One major contribution of this work is that, we adapt, implement and evaluate several existing methods for diversifying query results in the DivDB framework. We also propose two new approaches, namely the Greedy with Marginal Contribution (GMC) and the Greedy Randomized with Neighborhood Expansion (GNE) methods. Both methods iteratively construct a result set using a scoring function that ranks candidate elements using not only relevance and diversity to the existing result set, but also accounts for diversity against the remaining candidates. We also present the first thorough experimental evaluation of the various diversification techniques implemented in the DivDB framework. We examine the methods’ performance with respect to precision, running time and quality of the result. Our experimental results show that while the proposed methods have higher running times, they achieve precision very close to the optimal, while also providing the best result quality. While GMC is deterministic, the randomized approach (GNE) can achieve better result of the result if the user is willing to tradeoff running time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
These two terms are used interchangeably throughout the text.
- 2.
This is a simplified version of the Buffered Greedy Approach [16]. In our preliminary experiments, the results of both approaches were very similar, except that the one described here is several orders of magnitude faster than the Buffered Greedy.
References
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 5–14. ACM (2009). http://dx.doi.org/10.1145/1498759.1498766
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998). http://dx.doi.org/10.1145/290941.291025
Carterette, B.: An analysis of NP-completeness in novelty and diversity ranking. Information Retrieval 14(1), 89–106 (2011). http://dx.doi.org/10.1007/s10791-010-9157-1
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 621–630 (2009). DOI http://dx.doi.org/10.1145/1645953.1646033
Chen, H., Karger, D.: Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 429–436. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148245
Clarke, C., Kolla, M., Cormack, G., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008). DOI http://dx.doi.org/10.1145/1390334.1390446
Coyle, M., Smyth, B.: On the importance of being diverse. In: Z. Shi, Q. He (eds.) Intelligent Information Processing II, IFIP International Federation for Information Processing, vol. 163, pp. 341–350. Springer US (2005). DOI http://dx.doi.org/10.1007/0-387-23152-8_43
Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: Diversification for keyword search over structured databases. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 331–338. ACM (2010). DOI http://dx.doi.org/10.1145/1835449.1835506
Drosou, M., Pitoura, E.: Diversity over continuous data. IEEE Data Eng. Bull. 32(4), 49–56 (2009)
Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. of Global Optimization 6(2), 109–133 (1995). DOI http://dx.doi.org/10.1007/BF01096763
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990).
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 381–390. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526761
Hadjieleftheriou, M., Tsotras, V.J.: Letter from the special issue on result diversity. IEEE Data Eng. Bull. 32(4), 6 (2009).
Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithms for maximum dispersion. Oper. Res. Lett. 21(3), 133–137 (1997). DOI http://dx.doi.org/10.1016/S0167-6377(97)00034_5
Ioannou, E., Papapetrou, O., Skoutas, D., Nejdl, W.: Efficient semantic-aware detection of near duplicate resources. In: Proceedings of the Extended Semantic Web Conference (ESWC), LNCS, pp. 136–150. Springer (2010).
Jain, A., Sarda, P., Haritsa, J.: Providing diversity in k-nearest neighbor query results. In: H. Dai, R. Srikant, C. Zhang (eds.) Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3056, pp. 404–413. Springer, Berlin Heidelberg (2004). DOI http://dx.doi.org/10.1007/978-3-540-24775-3_49
Kuby, M.J.: Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geogr. Analysis 19, 315–329 (1987). DOI http://dx.doi.org/10.1111/j.1538-4632.1987.tb00133.x
Kuby, M.J.: Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems. Mathematical and Computer Modelling 10(10), 792 - (1988). DOI http://dx.doi.org/10.1016/0895-7177(88)90094--5
Kuo, C.C., Glover, F., Dhir, K.S.: Analyzing and modeling the maximum diversity problem by zero-one programming. Decision Sciences 24(6), 1171–1185 (1993). DOI http://dx.doi.org/10.1111/j.1540-5915.1993.tb00509.x
Laguna, M., MartĂ, R.: GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS Journal on Computing 11(1), 44–52 (1999).
van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 341–350. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526756
Ley, M.: The DBLP Computer Science Bibliography. www.informatik.uni-trier.de/~ley/db
Liu, K., Terzi, E., Grandison, T.: Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp. 545–556. SIAM (2009)
Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation. Proceedings of the VLDB Endowment (PVLDB) 2(1), 313–324 (2009)
Prais, M., Ribeiro, C.: Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing 12(3), 164–176 (2000). DOI http://dx.doi.org/10.1287/ijoc.12.3.164.12639
Prokopyev, O., Kong, N., Martinez-Torres, D.: The equitable dispersion problem. European J. of Operational Research 197(1), 59–67 (2009)
Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 691–692. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148320
Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 784–791. ACM (2008). DOI http://dx.doi.org/10.1145/1390156.1390255
Rafiei, D., Bharat, K., Shukla, A.: Diversifying web search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 781–790. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772770
Resende, M.G.C., Ribeiro, C.C.: Greedy randomized adaptive search procedures: Advances, hybridizations, and applications. In: M. Gendreau, J.Y. Potvin (eds.) Handbook of Metaheuristics, International Series in Operations Research & Management Science, vol. 146, 2 edn., pp. 283–320. Springer US (2010). DOI http://dx.doi.org/10.1007/978-1-4419-1665-5_10
Santos, R., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 881–890. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772780
Santos, R., Peng, J., Macdonald, C., Ounis, I.: Explicit search result diversification through sub-queries. In: C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, K. Rijsbergen (eds.) Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 5993, pp. 87–99. Springer, Berlin Heidelberg (2010). DOI http://dx.doi.org/10.1007/978-3-642-12275-0_11
Silva, G., de Andrade, M., Ochi, L., Martins, S., Plastino, A.: New heuristics for the maximum diversity problem. J. of Heuristics 13(4), 315–336 (2007). DOI http://dx.doi.org/10.1007/s10732-007-9010-x
Smyth, B., McClave, P.: Similarity vs. diversity. In: Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development (ICCBR), pp. 347–361 (2001)
The Text REtrieval Conference (TREC): Trec-3 collection. http://trec.nist.gov
Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Amer-Yahia, S.: Efficient computation of diverse query results. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 228–236. IEEE Computer Society (2008). DOI http://dx.doi.org/10.1109/ICDE.2008.4497431
Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 368–378. ACM (2009). DOI http://dx.doi.org/10.1145/1516360.1516404
Zhai, C., Cohen, W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 10–17. ACM (2003). DOI http://dx.doi.org/10.1145/860435.860440
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 504–511. ACM (2005). DOI http://dx.doi.org/10.1145/1076034.1076120
Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 791–800. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526816
Zhu, X., Goldberg, A.B., Gael, J.V., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT), pp. 97–104 (2007)
Ziegler, C.N., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 22–32. ACM (2005). DOI http://dx.doi.org/10.1145/1060745.1060754
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2013 The Author(s)
About this chapter
Cite this chapter
Vieira, M.R., Tsotras, V.J. (2013). Diversified Pattern Queries. In: Spatio-Temporal Databases. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02408-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-02408-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02407-3
Online ISBN: 978-3-319-02408-0
eBook Packages: Computer ScienceComputer Science (R0)