Skip to main content
Log in

Exemplar queries: a new way of searching

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Modern search engines employ advanced techniques that go beyond the structures that strictly satisfy the query conditions in an effort to better capture the user intentions. In this work, we introduce a novel query paradigm that considers a user query as an example of the data in which the user is interested. We call these queries exemplar queries. We provide a formal specification of their semantics and show that they are fundamentally different from notions like queries by example, approximate queries and related queries. We provide an implementation of these semantics for knowledge graphs and present an exact solution with a number of optimizations that improve performance without compromising the result quality. We study two different congruence relations, isomorphism and strong simulation, for identifying the answers to an exemplar query. We also provide an approximate solution that prunes the search space and achieves considerably better time performance with minimal or no impact on effectiveness. The effectiveness and efficiency of these solutions with synthetic and real datasets are experimentally evaluated, and the importance of exemplar queries in practice is illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. In the rest of the document, we will be dropping the part “edge-preserving.”

  2. A subgraph induced by a set of nodes \(N\) is the subgraph whose edges have both endpoints in \(N\).

  3. Note that those nodes will be removed later, when the actual isomorphic check will be performed.

  4. http://www.gregsadetsky.com/aol-data.

  5. List of queries: http://www.mi.parisdescartes.fr/~themisp/exemplarquery-ext/ .

  6. https://github.com/mutandon/Grava.

  7. For ease of exposition, we do not report the complete list of entities in the answer.

References

  1. Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying search results. In: WSDM (2009)

  2. Anagnostopoulos, A., Becchetti, L., Castillo, C., Gionis, A.: An optimization framework for query recommendation. In: WSDM (2010)

  3. Baeza-Yates, R., Boldi, P., Castillo, C.: Generalizing pagerank: damping functions for link-based ranking algorithms. In: SIGIR (2006)

  4. Bedini, I., Elser, B., Velegrakis, Y.: The trento big data platform for public administration and large companies: use cases and opportunities. In: PVLDB, vol. 6(11) (2013)

  5. Beeri, C., Milo, T.: Schemas for integration and translation of structured and semi-structured data. In: ICDT. Springer, Berlin (1999)

  6. Bergamaschi, S., Domnori, E., Guerra, F., Trillo Lado, R., Velegrakis, Y.: Keyword search over relational databases: a metadata approach. In: SIGMOD (2011)

  7. Bergamaschi, S., Guerra, F., Rota, S., Velegrakis, Y.: A hidden markov model approach to keyword-based search over relational databases. In: ER (2011)

  8. Bhatia, S., Majumdar, D., Mitra, P.: Query suggestions in the absence of query logs. In: SIGIR (2011)

  9. Boldi, P., Bonchi, F., Castillo, C., Vigna, S.: Query reformulation mining: models, patterns, and applications. Inf. Retr. 14(3), 257 (2011)

    Article  Google Scholar 

  10. Bordino, I., De Francisci Morales, G., Weber, I., Bonchi, F.: From machu_picchu to rafting the urubamba river: anticipating information needs via the entity-query graph. In: WSDM (2013)

  11. Chakrabarti, S.: Dynamic personalized pagerank in entity-relation graphs. In: WWW (2007)

  12. Cook, S. A.: The complexity of theorem-proving procedures. In: Symposium on Theory of Computing (1971)

  13. Dimitriadou, K., Papaemmanouil, O., Diao, Y.: Explore-by-example: an automatic query steering framework for interactive data exploration. In: SIGMOD (2014)

  14. Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD (2005)

  15. Dou, Z., Hu, S., Luo, Y., Song, R., Wen, J.: Finding dimensions for queries. In: CIKM, pp. 1311–1320 (2011)

  16. Fan, W., Li, J., Ma, S., Wang, H., Wu, Y.: Graph homomorphism revisited for graph matching. PVLDB 3(1–2), 1161 (2010)

    Google Scholar 

  17. Gallego, M.A., Fernández, J.D., Martínez-Prieto, M.A.: and P. de la Fuente. An empirical study of real-world SPARQL queries. In USEWOD Workshop-WWW (2011)

  18. Gao, X., Xiao, B., Tao, D., Li, X.: A survey of graph edit distance. Pattern Anal. Appl. 13(1), 113 (2010)

    Article  MathSciNet  Google Scholar 

  19. Gauch, S., Smith, J.B.: Search improvement via automatic query reformulation. TOIS 9(3), 249–280 (1991)

    Article  Google Scholar 

  20. Google. Freebase data dumps. https://developers.google.com/freebase/data (2014)

  21. Haveliwala, T. H.: Topic-sensitive pagerank. In: WWW (2002)

  22. Henzinger, M. R., Henzinger, T. A., Kopke, P. W.: Computing simulations on finite and infinite graphs. In: FOCS (1995)

  23. Hogan, A., Mellotte, M., Powell, G., Stampouli, D.: Towards fuzzy query-relaxation for rdf. In: The Semantic Web: Research and Applications, pp. 687–702. Springer, Berlin (2012)

  24. Jansen, B., Booth, D., Spink, A.: Determining the informational, navigational, and transactional intent of web queries. Inf Process Manag 44, 1251 (2008)

    Article  Google Scholar 

  25. Jeh, G., Widom, J.: Scaling personalized web search. In: WWW (2003)

  26. Kargar, M., An, A.: Keyword search in graphs: Finding r-cliques. Proc VLDB Endow 4(10), 681 (2011)

    Article  Google Scholar 

  27. Kasneci, G., Ramanath, M., Sozio, M., Suchanek, F.M., Weikum, G.: Star: Steiner-tree approximation in relationship graphs. In: ICDE (2009)

  28. Khan, A., Li, N., Yan, X., Guan, Z., Chakraborty, S., Tao, S.: Neighborhood based fast graph search in large networks. In: SIGMOD (2011)

  29. Khan, A., Wu, Y., Aggarwal, C.C., Yan, X.: Nema: Fast graph search with label similarity. In: PVLDB (2013)

  30. Lao, N., Cohen, W.W.: Fast query execution for retrieval models based on path-constrained random walks. In: KDD (2010)

  31. Lissandrini, M., Mottin, D., Palpanas, T., Papadimitriou, D., Velegrakis, Y.: Unleashing the power of information graphs. SIGMOD Rec. 43(4), 21 (2015)

  32. Ma, S., Cao, Y., Fan, W., Huai, J., Wo, T.: Strong simulation: capturing topology in graph pattern matching. TODS 39(1), 4 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  33. Mishra, C., Koudas, N.: Interactive query refinement. In: EDBT (2009)

  34. Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Exemplar queries: give me an example of what you need. Proc. VLDB Endow. 7(5), 365 (2014)

    Article  Google Scholar 

  35. Mottin, D., Lissandrini, M., Velegrakis, Y., Palpanas, T.: Searching with XQ: The Exemplar Query Search Engine. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 901–904. ACM, NY, USA

  36. Mottin, D., Marascu, A., Roy, S.B., Das, G., Palpanas, T., Velegrakis, Y.: A probabilistic optimization framework for the empty-answer problem. Proc. VLDB Endow. 6(14), 1762–1773 (2013)

    Article  Google Scholar 

  37. Mottin, D., Palpanas, T., Velegrakis, Y.: Entity Ranking Using Click-Log Information. Intel. Data Anal. J. 17(5), 837 (2013)

    Google Scholar 

  38. Ngo, V. M., Cao, T. H.: Ontology-based query expansion with latently related named entities for semantic text search. In: IJIIDS (2010)

  39. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. TR 1999-66, Stanford InfoLab (November)

  40. Park, D.: Concurrency and Automata on Infinite Sequences. Springer, Berlin (1981)

    Book  MATH  Google Scholar 

  41. Pound, J., Hudek, A. K., Ilyas, I. F., Weddell, G.: Interpreting keyword queries over web knowledge bases. In: CIKM (2012)

  42. Qiu, Y., Frei, H.-P.: Concept based query expansion. In: SIGIR (1993)

  43. Shannon, C.E.: A mathematical theory of communication. SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)

    Article  MathSciNet  Google Scholar 

  44. Shen, Y., Chakrabarti, K., Jones, M.: Discovering queries based on example tuples. In: SIGMOD (2014)

  45. Ullmann, J.R.: An Algorithm for Subgraph Isomorphism. J ACM. 23(1), 31–42 (1976)

  46. Vallet, D., Zaragoza, H.: Inferring the most important types of a query: a semantic approach. In: SIGIR, pp. 857–858 (2008)

  47. Wang, X., Ding, X., Tung, A. K. H., Ying, S., Jin, H.: An efficient graph indexing method. In: ICDE, pp. 210–221 (2012)

  48. Wang, X., Zhai, C.: Mining term association patterns from search logs for effective query reformulation. In: CIKM, pp. 479–488 (2008)

  49. Xing, W., Ghorbani, A.: Weighted pagerank algorithm. In: CNSR, pp. 305–314 (2004)

  50. Yan, X., Yu, P. S., Han, J.: Graph indexing: a frequent structure-based approach. In: SIGMOD (2004)

  51. Yang, S., Wu, Y., Sun, H., Yan, X.: Schemaless and structureless graph querying. Proc. VLDB Endow. 7(7), 565 (2014)

    Article  Google Scholar 

  52. Zhao, P., Han, J.: On graph query optimization in large networks. VLDB J. 3(1–2), 340–351 (2010)

    Google Scholar 

  53. Zloof, M. M.: Query by example. In: AFIPS NCC, pp. 431–438 (1975)

Download references

Acknowledgments

This work was partially supported by the Trento RISE Big Data Project [4] and the Keystone COST action IC1302. We would like to thank the authors of [10], NeMa [29] and strong simulation [32] for kindly providing us their code. We thank Paola Quaglia for the valuable discussion and suggestions about simulation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matteo Lissandrini.

Additional information

Y. Velegrakis was partially supported by the ERC grant Lucretius and the KEYSTONE Cost Action.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mottin, D., Lissandrini, M., Velegrakis, Y. et al. Exemplar queries: a new way of searching. The VLDB Journal 25, 741–765 (2016). https://doi.org/10.1007/s00778-016-0429-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-016-0429-2

Keywords

Navigation