Skip to main content
Log in

Précis: from unstructured keywords as queries to structured databases as answers

  • Special Issue Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

Précis queries represent a novel way of accessing data, which combines ideas and techniques from the fields of databases and information retrieval. They are free-form, keyword-based, queries on top of relational databases that generate entire multi-relation databases, which are logical subsets of the original ones. A logical subset contains not only items directly related to the given query keywords but also items implicitly related to them in various ways, with the purpose of providing to the user much greater insight into the original data. In this paper, we lay the foundations for the concept of logical database subsets that are generated from précis queries under a generalized perspective that removes several restrictions of previous work. In particular, we extend the semantics of précis queries considering that they may contain multiple terms combined through the AND, OR, and NOT operators. On the basis of these extended semantics, we define the concept of a logical database subset, we identify the one that is most relevant to a given query, and we provide algorithms for its generation. Finally, we present an extensive set of experimental results that demonstrate the efficiency and benefits of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agrawal, R., Rantzau, R., Terzi, E.: Context-sensitive ranking. In: SIGMOD Conference, pp. 383–394 (2006)

  2. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)

  3. Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: CIDR (2003)

  4. Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: Authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)

  5. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)

  6. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW (1998)

  7. Callan, J., Croft, W.B., Harding, S.M.: The INQUERY retrieval system. In: 3rd International Conference on Database and Expert Systems Applications, pp. 78–83 (1992)

  8. Chakrabarti, K., Chaudhuri, S., won Hwang, S.: Automatic categorization of query resutls. In: SIGMOD Conference, pp. 755–766 (2004)

  9. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)

  10. Collins A. and Quillian M. (1969). Retrieval time from semantic memory. J. Verbal Learn. Verbal Behav. 8: 240–247

    Article  Google Scholar 

  11. Cooper W.S. (1988). Getting beyond boole. Inf. Process. Manage. 24(3): 243–248

    Article  Google Scholar 

  12. Croft W.B. and Harper D.J. (1979). Using probabilistic models of document retrieval without relevance information. J. Doc. 35: 285–295

    Google Scholar 

  13. Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Probabilistic query expansion using query logs. In: WWW (2002)

  14. Das, G., Hristidis, V., Kapoor, N., Sudarshan, S.: Ordering the attributes of query results. In: SIGMOD Conference, pp. 395–406 (2006)

  15. Dixon P. (2001). Basics of oracle text retrieval. IEEE Data Eng. Bull. 24(4): 11–14

    MathSciNet  Google Scholar 

  16. Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005)

  17. Florescu D., Kossmann D. and Manolescu I. (2000). Integrating keyword search into XML query processing. Comput. Netw. 33: 1–6

    Article  Google Scholar 

  18. Florescu D., Levy A.Y. and Mendelzon A.O. (1998). Database techniques for the World-Wide Web: a survey. SIGMOD Rec. 27(3): 59–74

    Article  Google Scholar 

  19. Geerts, F., Mannila, H., Terzi, E.: Relational link-based ranking. In: VLDB, pp. 552–563 (2004)

  20. Goldman, R., Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. In: VLDB, pp. 26–37 (1998)

  21. Graupmann, J., Schenkel, R., Weikum, G.: The SphereSearch engine for unified ranked retrieval of heterogeneous xml and web documents. In: VLDB, pp. 529–540 (2005)

  22. Greiff W. (2002). Advances in information retrieval: recent research from the center for intelligent information retrieval. In: Bruce Croft, W. (eds) The Use of Exploratory Data Analysis in Information Retrieval Research, pp 37–70. Kluwer, Boston

    Google Scholar 

  23. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)

  24. Hamilton J.R. and Nayak T.K. (2001). Microsoft SQL server full-text search. IEEE Data Eng. Bull. 24(4): 7–10

    Google Scholar 

  25. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)

  26. Hristidis V., Koudas N., Papakonstantinou Y. and Srivastava D. (2006). Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4): 525–539

    Article  Google Scholar 

  27. Hristidis, V., Papakonstantinou, Y.: DISCOVER: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

  28. Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)

  29. Hulgeri A., Bhalotia G., Nakhe C., Chakrabarti S. and Sudarshan S. (2001). Keyword search in databases. IEEE Data Eng. Bull. 24(3): 22–32

    Google Scholar 

  30. Hwang F., Winter P. and Richards D. (1992). Steiner Tree Problem. Elsevier, Amsterdam

    MATH  Google Scholar 

  31. IBM: DB2 Text Information Extender. http://www-306.ibm.com/software/data/db2/extenders/text/

  32. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)

  33. Kimelfeld, B., Sagiv, Y.: Efficient engines for keyword proximity search. In: WebDB, pp. 67–72 (2005)

  34. Kimelfeld, B., Sagiv, Y.: Efficiently enumerating results of keyword search. In: DBPL, pp. 58–73 (2005)

  35. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)

  36. Kleinberg J. (1999). Authoritative sources in a hyperlinked environment. ACM J. 46(5): 604–632

    Article  MATH  MathSciNet  Google Scholar 

  37. Koutrika, G., Ioannidis, Y.: Personalized queries under a generalized preference model. In: ICDE, pp. 841–852 (2005)

  38. Koutrika, G., Simitsis, A., Ioannidis, Y.: Précis: The essence of a query answer. In: ICDE, pp. 69–78 (2006)

  39. Lawrence S. and Giles C.L. (1998). Searching the World Wide Web. Science 280: 98–100

    Article  Google Scholar 

  40. Liu, F., Yu, C.T., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD Conference, pp. 563–574 (2006)

  41. Maier A. and Simmen D.E. (2001). DB2 optimization in support of full text search. IEEE Data Eng. Bull. 24(4): 3–6

    Google Scholar 

  42. Marchionini G. (1992). Interfaces for end-user information seeking. J. Am. Soc. Inf. Sci. 43(2): 156–163

    Article  Google Scholar 

  43. Masermann, U., Vossen, G.: Design and implementation of a novel approach to keyword searching in relational databases. In: ADBIS-DASFAA, pp. 171–184 (2000)

  44. McCluskey E.J. (1986). Logic Design Principles. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  45. Microsoft: SQL Server 2000. http://msdn.microsoft.com/library/

  46. Motro A. (1986). BAROQUE: A browser for relational databases. ACM Trans. Inf. Syst. 4(2): 164–181

    Article  Google Scholar 

  47. Motro, A.: Constructing queries from tokens. In: SIGMOD, pp. 120–131 (1986)

  48. MySQL: MySQL. http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

  49. Oracle: Oracle 9i Text. http://www.oracle.com/technology/products/text/index.html

  50. Robertson, S.E.: Readings in information retrieval. The probability ranking principle in IR, pp. 281 – 286. Morgan Kaufmann, San Mate (1997)

  51. Salton G. (1971). The SMART System—Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs

    Google Scholar 

  52. Sarawagi, S.: Special issue on data cleaning. IEEE Data Eng. Bull. 23 (4) (2000)

  53. Simitsis, A., Koutrika, G.: Comprehensible answers to précis queries. In: CAiSE, pp. 142–156 (2006)

  54. Simitsis, A., Koutrika, G.: Pattern-based query answering. In: PaRMa, pp. 41–50 (2006)

  55. Simitsis, A., Koutrika, G., Ioannidis, Y.: Generalized précis queries for logical database subset creation. In: ICDE (2007)

  56. Tao Y., Hristidis V., Papadias D. and Papakonstantinou Y. (2007). Branch-and-bound processing of ranked queries. Inf. Syst. 32(3): 424–445

    Article  Google Scholar 

  57. Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE, pp. 277–290 (2003)

  58. Wang, Q., Nass, C., Hu, J.: Natural language query vs. keyword search: Effects of task complexity on search performance, participant perceptions, and preferences. In: INTERACT, pp. 106–116 (2005)

  59. Wang S. and Zhang K. (2005). Searching databases with keywords. J. Comput. Sci. Technol. 20(1): 55–62

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgia Koutrika.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Simitsis, A., Koutrika, G. & Ioannidis, Y. Précis: from unstructured keywords as queries to structured databases as answers. The VLDB Journal 17, 117–149 (2008). https://doi.org/10.1007/s00778-007-0075-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-007-0075-9

Keywords

Navigation