Skip to main content

Diversified Pattern Queries

  • Chapter
  • First Online:
Spatio-Temporal Databases

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

In this chapter we describe a general framework, called DivDB, for evaluation and optimization of methods for diversifying query results. In these methods, an initial ranking candidate set produced by a query is used to construct a result set. The elements in the result set are ranked with respect to relevance and diversity features, i.e., the retrieved elements should be as relevant as possible to the query, and, at the same time, the result set should be as diverse as possible. While addressing relevance is relatively simple and has been heavily studied, diversity is a harder problem to solve. One major contribution of this work is that, we adapt, implement and evaluate several existing methods for diversifying query results in the DivDB framework. We also propose two new approaches, namely the Greedy with Marginal Contribution (GMC) and the Greedy Randomized with Neighborhood Expansion (GNE) methods. Both methods iteratively construct a result set using a scoring function that ranks candidate elements using not only relevance and diversity to the existing result set, but also accounts for diversity against the remaining candidates. We also present the first thorough experimental evaluation of the various diversification techniques implemented in the DivDB framework. We examine the methods’ performance with respect to precision, running time and quality of the result. Our experimental results show that while the proposed methods have higher running times, they achieve precision very close to the optimal, while also providing the best result quality. While GMC is deterministic, the randomized approach (GNE) can achieve better result of the result if the user is willing to tradeoff running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    These two terms are used interchangeably throughout the text.

  2. 2.

    This is a simplified version of the Buffered Greedy Approach [16]. In our preliminary experiments, the results of both approaches were very similar, except that the one described here is several orders of magnitude faster than the Buffered Greedy.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: Proceedings of the ACM International Conference on Web Search and Data Mining (WSDM), pp. 5–14. ACM (2009). http://dx.doi.org/10.1145/1498759.1498766

  2. Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998). http://dx.doi.org/10.1145/290941.291025

  3. Carterette, B.: An analysis of NP-completeness in novelty and diversity ranking. Information Retrieval 14(1), 89–106 (2011). http://dx.doi.org/10.1007/s10791-010-9157-1

  4. Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: Proceedings of the ACM International Conference on Information and Knowledge Management (CIKM), pp. 621–630 (2009). DOI http://dx.doi.org/10.1145/1645953.1646033

  5. Chen, H., Karger, D.: Less is more: probabilistic models for retrieving fewer relevant documents. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 429–436. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148245

  6. Clarke, C., Kolla, M., Cormack, G., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and diversity in information retrieval evaluation. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 659–666. ACM (2008). DOI http://dx.doi.org/10.1145/1390334.1390446

  7. Coyle, M., Smyth, B.: On the importance of being diverse. In: Z. Shi, Q. He (eds.) Intelligent Information Processing II, IFIP International Federation for Information Processing, vol. 163, pp. 341–350. Springer US (2005). DOI http://dx.doi.org/10.1007/0-387-23152-8_43

  8. Demidova, E., Fankhauser, P., Zhou, X., Nejdl, W.: DivQ: Diversification for keyword search over structured databases. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 331–338. ACM (2010). DOI http://dx.doi.org/10.1145/1835449.1835506

  9. Drosou, M., Pitoura, E.: Diversity over continuous data. IEEE Data Eng. Bull. 32(4), 49–56 (2009)

    Google Scholar 

  10. Feo, T.A., Resende, M.G.C.: Greedy randomized adaptive search procedures. J. of Global Optimization 6(2), 109–133 (1995). DOI http://dx.doi.org/10.1007/BF01096763

    Google Scholar 

  11. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., New York, NY, USA (1990).

    Google Scholar 

  12. Gollapudi, S., Sharma, A.: An axiomatic approach for result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 381–390. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526761

  13. Hadjieleftheriou, M., Tsotras, V.J.: Letter from the special issue on result diversity. IEEE Data Eng. Bull. 32(4), 6 (2009).

    Google Scholar 

  14. Hassin, R., Rubinstein, S., Tamir, A.: Approximation algorithms for maximum dispersion. Oper. Res. Lett. 21(3), 133–137 (1997). DOI http://dx.doi.org/10.1016/S0167-6377(97)00034_5

    Google Scholar 

  15. Ioannou, E., Papapetrou, O., Skoutas, D., Nejdl, W.: Efficient semantic-aware detection of near duplicate resources. In: Proceedings of the Extended Semantic Web Conference (ESWC), LNCS, pp. 136–150. Springer (2010).

    Google Scholar 

  16. Jain, A., Sarda, P., Haritsa, J.: Providing diversity in k-nearest neighbor query results. In: H. Dai, R. Srikant, C. Zhang (eds.) Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science, vol. 3056, pp. 404–413. Springer, Berlin Heidelberg (2004). DOI http://dx.doi.org/10.1007/978-3-540-24775-3_49

  17. Kuby, M.J.: Programming models for facility dispersion: The p-dispersion and maxisum dispersion problems. Geogr. Analysis 19, 315–329 (1987). DOI http://dx.doi.org/10.1111/j.1538-4632.1987.tb00133.x

  18. Kuby, M.J.: Programming models for facility dispersion: the p-dispersion and maxisum dispersion problems. Mathematical and Computer Modelling 10(10), 792 - (1988). DOI http://dx.doi.org/10.1016/0895-7177(88)90094--5

  19. Kuo, C.C., Glover, F., Dhir, K.S.: Analyzing and modeling the maximum diversity problem by zero-one programming. Decision Sciences 24(6), 1171–1185 (1993). DOI http://dx.doi.org/10.1111/j.1540-5915.1993.tb00509.x

  20. Laguna, M., Martí, R.: GRASP and path relinking for 2-layer straight line crossing minimization. INFORMS Journal on Computing 11(1), 44–52 (1999).

    Google Scholar 

  21. van Leuken, R., Garcia, L., Olivares, X., van Zwol, R.: Visual diversification of image search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 341–350. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526756

  22. Ley, M.: The DBLP Computer Science Bibliography. www.informatik.uni-trier.de/~ley/db

  23. Liu, K., Terzi, E., Grandison, T.: Highlighting diverse concepts in documents. In: Proceedings of the SIAM International Conference on Data Mining (SDM), pp. 545–556. SIAM (2009)

    Google Scholar 

  24. Liu, Z., Sun, P., Chen, Y.: Structured search result differentiation. Proceedings of the VLDB Endowment (PVLDB) 2(1), 313–324 (2009)

    Google Scholar 

  25. Prais, M., Ribeiro, C.: Reactive GRASP: An application to a matrix decomposition problem in TDMA traffic assignment. INFORMS Journal on Computing 12(3), 164–176 (2000). DOI http://dx.doi.org/10.1287/ijoc.12.3.164.12639

  26. Prokopyev, O., Kong, N., Martinez-Torres, D.: The equitable dispersion problem. European J. of Operational Research 197(1), 59–67 (2009)

    Google Scholar 

  27. Radlinski, F., Dumais, S.: Improving personalized web search using result diversification. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 691–692. ACM (2006). DOI http://dx.doi.org/10.1145/1148170.1148320

  28. Radlinski, F., Kleinberg, R., Joachims, T.: Learning diverse rankings with multi-armed bandits. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 784–791. ACM (2008). DOI http://dx.doi.org/10.1145/1390156.1390255

  29. Rafiei, D., Bharat, K., Shukla, A.: Diversifying web search results. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 781–790. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772770

  30. Resende, M.G.C., Ribeiro, C.C.: Greedy randomized adaptive search procedures: Advances, hybridizations, and applications. In: M. Gendreau, J.Y. Potvin (eds.) Handbook of Metaheuristics, International Series in Operations Research & Management Science, vol. 146, 2 edn., pp. 283–320. Springer US (2010). DOI http://dx.doi.org/10.1007/978-1-4419-1665-5_10

  31. Santos, R., Macdonald, C., Ounis, I.: Exploiting query reformulations for web search result diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 881–890. ACM (2010). DOI http://dx.doi.org/10.1145/1772690.1772780

  32. Santos, R., Peng, J., Macdonald, C., Ounis, I.: Explicit search result diversification through sub-queries. In: C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, S. Rüger, K. Rijsbergen (eds.) Advances in Information Retrieval, Lecture Notes in Computer Science, vol. 5993, pp. 87–99. Springer, Berlin Heidelberg (2010). DOI http://dx.doi.org/10.1007/978-3-642-12275-0_11

  33. Silva, G., de Andrade, M., Ochi, L., Martins, S., Plastino, A.: New heuristics for the maximum diversity problem. J. of Heuristics 13(4), 315–336 (2007). DOI http://dx.doi.org/10.1007/s10732-007-9010-x

    Google Scholar 

  34. Smyth, B., McClave, P.: Similarity vs. diversity. In: Proceedings of the International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development (ICCBR), pp. 347–361 (2001)

    Google Scholar 

  35. The Text REtrieval Conference (TREC): Trec-3 collection. http://trec.nist.gov

  36. Vee, E., Srivastava, U., Shanmugasundaram, J., Bhat, P., Amer-Yahia, S.: Efficient computation of diverse query results. In: Proceedings of the IEEE International Conference on Data Engineering (ICDE), pp. 228–236. IEEE Computer Society (2008). DOI http://dx.doi.org/10.1109/ICDE.2008.4497431

  37. Yu, C., Lakshmanan, L., Amer-Yahia, S.: It takes variety to make a world: diversification in recommender systems. In: Proceedings of the International Conference on Extending Database Technology (EDBT), pp. 368–378. ACM (2009). DOI http://dx.doi.org/10.1145/1516360.1516404

  38. Zhai, C., Cohen, W., Lafferty, J.: Beyond independent relevance: methods and evaluation metrics for subtopic retrieval. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 10–17. ACM (2003). DOI http://dx.doi.org/10.1145/860435.860440

  39. Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.Y.: Improving web search results using affinity graph. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval, pp. 504–511. ACM (2005). DOI http://dx.doi.org/10.1145/1076034.1076120

  40. Zheng, Y., Zhang, L., Xie, X., Ma, W.Y.: Mining interesting locations and travel sequences from GPS trajectories. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 791–800. ACM (2009). DOI http://dx.doi.org/10.1145/1526709.1526816

  41. Zhu, X., Goldberg, A.B., Gael, J.V., Andrzejewski, D.: Improving diversity in ranking using absorbing random walks. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics (NAACL-HLT), pp. 97–104 (2007)

    Google Scholar 

  42. Ziegler, C.N., McNee, S., Konstan, J., Lausen, G.: Improving recommendation lists through topic diversification. In: Proceedings of the International Conference on World Wide Web (WWW), pp. 22–32. ACM (2005). DOI http://dx.doi.org/10.1145/1060745.1060754

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcos R. Vieira .

Rights and permissions

Reprints and permissions

Copyright information

© 2013 The Author(s)

About this chapter

Cite this chapter

Vieira, M.R., Tsotras, V.J. (2013). Diversified Pattern Queries. In: Spatio-Temporal Databases. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-02408-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02408-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02407-3

  • Online ISBN: 978-3-319-02408-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics