Diversified Pattern Queries

Vieira, Marcos R.; Tsotras, Vassilis J.

doi:10.1007/978-3-319-02408-0_5

Marcos R. Vieira¹⁶ &
Vassilis J. Tsotras¹⁷

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

In this chapter we describe a general framework, called DivDB, for evaluation and optimization of methods for diversifying query results. In these methods, an initial ranking candidate set produced by a query is used to construct a result set. The elements in the result set are ranked with respect to relevance and diversity features, i.e., the retrieved elements should be as relevant as possible to the query, and, at the same time, the result set should be as diverse as possible. While addressing relevance is relatively simple and has been heavily studied, diversity is a harder problem to solve. One major contribution of this work is that, we adapt, implement and evaluate several existing methods for diversifying query results in the DivDB framework. We also propose two new approaches, namely the Greedy with Marginal Contribution (GMC) and the Greedy Randomized with Neighborhood Expansion (GNE) methods. Both methods iteratively construct a result set using a scoring function that ranks candidate elements using not only relevance and diversity to the existing result set, but also accounts for diversity against the remaining candidates. We also present the first thorough experimental evaluation of the various diversification techniques implemented in the DivDB framework. We examine the methods’ performance with respect to precision, running time and quality of the result. Our experimental results show that while the proposed methods have higher running times, they achieve precision very close to the optimal, while also providing the best result quality. While GMC is deterministic, the randomized approach (GNE) can achieve better result of the result if the user is willing to tradeoff running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
These two terms are used interchangeably throughout the text.
2.
This is a simplified version of the Buffered Greedy Approach [16]. In our preliminary experiments, the results of both approaches were very similar, except that the one described here is several orders of magnitude faster than the Buffered Greedy.

References

Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 5–14. ACM (2009). http://dx.doi.org/10.1145/1498759.1498766
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998). http://dx.doi.org/10.1145/290941.291025
Carterette, B.: An analysis of NP-completeness in novelty and diversity ranking. Information Retrieval 14(1), 89–106 (2011). http://dx.doi.org/10.1007/s10791-010-9157-1
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 621–630 (2009). DOI http://dx.doi.org/10.1145/1645953.1646033
Chen, H., Karger, D.: Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 429–436. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148245
Clarke, C., Kolla, M., Cormack, G., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008). DOI http://dx.doi.org/10.1145/1390334.1390446
Coyle, M., Smyth, B.: On the importance of being diverse. In: Z. Shi, Q. He (eds.) Intelligent Information Processing II, IFIP International Federation for Information Processing, vol. 163, pp. 341–350. Springer US (2005). DOI http://dx.doi.org/10.1007/0-387-23152-8_43
Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: Diversification for keyword search over structured databases. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 331–338. ACM (2010). DOI http://dx.doi.org/10.1145/1835449.1835506
Drosou, M., Pitoura, E.: Diversity over continuous data. IEEE Data Eng. Bull. 32(4), 49–56 (2009)
Google Scholar
Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. of Global Optimization 6(2), 109–133 (1995). DOI http://dx.doi.org/10.1007/BF01096763
Google Scholar
Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990).
Google Scholar
Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 381–390. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526761
Hadjieleftheriou, M., Tsotras, V.J.: Letter from the special issue on result diversity. IEEE Data Eng. Bull. 32(4), 6 (2009).
Google Scholar
Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithms for maximum dispersion. Oper. Res. Lett. 21(3), 133–137 (1997). DOI http://dx.doi.org/10.1016/S0167-6377(97)00034_5
Google Scholar
Ioannou, E., Papapetrou, O., Skoutas, D., Nejdl, W.: Efficient semantic-aware detection of near duplicate resources. In: Proceedings of the Extended Semantic Web Conference (ESWC), LNCS, pp. 136–150. Springer (2010).
Google Scholar
Jain, A., Sarda, P., Haritsa, J.: Providing diversity in k-nearest neighbor query results. In: H. Dai, R. Srikant, C. Zhang (eds.) Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3056, pp. 404–413. Springer, Berlin Heidelberg (2004). DOI http://dx.doi.org/10.1007/978-3-540-24775-3_49
Kuby, M.J.: Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geogr. Analysis 19, 315–329 (1987). DOI http://dx.doi.org/10.1111/j.1538-4632.1987.tb00133.x
Kuby, M.J.: Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems. Mathematical and Computer Modelling 10(10), 792 - (1988). DOI http://dx.doi.org/10.1016/0895-7177(88)90094--5
Kuo, C.C., Glover, F., Dhir, K.S.: Analyzing and modeling the maximum diversity problem by zero-one programming. Decision Sciences 24(6), 1171–1185 (1993). DOI http://dx.doi.org/10.1111/j.1540-5915.1993.tb00509.x
Laguna, M., Martí, R.: GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS Journal on Computing 11(1), 44–52 (1999).
Google Scholar
van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 341–350. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526756
Ley, M.: The DBLP Computer Science Bibliography. www.informatik.uni-trier.de/~ley/db
Liu, K., Terzi, E., Grandison, T.: Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp. 545–556. SIAM (2009)
Google Scholar
Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation. Proceedings of the VLDB Endowment (PVLDB) 2(1), 313–324 (2009)
Google Scholar
Prais, M., Ribeiro, C.: Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing 12(3), 164–176 (2000). DOI http://dx.doi.org/10.1287/ijoc.12.3.164.12639
Prokopyev, O., Kong, N., Martinez-Torres, D.: The equitable dispersion problem. European J. of Operational Research 197(1), 59–67 (2009)
Google Scholar
Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 691–692. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148320
Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 784–791. ACM (2008). DOI http://dx.doi.org/10.1145/1390156.1390255
Rafiei, D., Bharat, K., Shukla, A.: Diversifying web search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 781–790. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772770
Resende, M.G.C., Ribeiro, C.C.: Greedy randomized adaptive search procedures: Advances, hybridizations, and applications. In: M. Gendreau, J.Y. Potvin (eds.) Handbook of Metaheuristics, International Series in Operations Research & Management Science, vol. 146, 2 edn., pp. 283–320. Springer US (2010). DOI http://dx.doi.org/10.1007/978-1-4419-1665-5_10
Santos, R., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 881–890. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772780
Santos, R., Peng, J., Macdonald, C., Ounis, I.: Explicit search result diversification through sub-queries. In: C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, K. Rijsbergen (eds.) Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 5993, pp. 87–99. Springer, Berlin Heidelberg (2010). DOI http://dx.doi.org/10.1007/978-3-642-12275-0_11
Silva, G., de Andrade, M., Ochi, L., Martins, S., Plastino, A.: New heuristics for the maximum diversity problem. J. of Heuristics 13(4), 315–336 (2007). DOI http://dx.doi.org/10.1007/s10732-007-9010-x
Google Scholar
Smyth, B., McClave, P.: Similarity vs. diversity. In: Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development (ICCBR), pp. 347–361 (2001)
Google Scholar
The Text REtrieval Conference (TREC): Trec-3 collection. http://trec.nist.gov
Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Amer-Yahia, S.: Efficient computation of diverse query results. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 228–236. IEEE Computer Society (2008). DOI http://dx.doi.org/10.1109/ICDE.2008.4497431
Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 368–378. ACM (2009). DOI http://dx.doi.org/10.1145/1516360.1516404
Zhai, C., Cohen, W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 10–17. ACM (2003). DOI http://dx.doi.org/10.1145/860435.860440
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 504–511. ACM (2005). DOI http://dx.doi.org/10.1145/1076034.1076120
Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 791–800. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526816
Zhu, X., Goldberg, A.B., Gael, J.V., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT), pp. 97–104 (2007)
Google Scholar
Ziegler, C.N., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 22–32. ACM (2005). DOI http://dx.doi.org/10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

IBM Research Laboratory Brazil, Rio de Janeiro, RJ, Brazil
Marcos R. Vieira
Department of Computer Science and Engineering, Bourns College of Engineering, University of California, Riverside, CA, USA
Vassilis J. Tsotras

Authors

Marcos R. Vieira
View author publications
You can also search for this author in PubMed Google Scholar
Vassilis J. Tsotras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcos R. Vieira .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vieira, M.R., Tsotras, V.J. (2013). Diversified Pattern Queries. In: Spatio-Temporal Databases. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02408-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-02408-0_5
Published: 16 October 2013
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02407-3
Online ISBN: 978-3-319-02408-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics