Abstract
Précis queries represent a novel way of accessing data, which combines ideas and techniques from the fields of databases and information retrieval. They are free-form, keyword-based, queries on top of relational databases that generate entire multi-relation databases, which are logical subsets of the original ones. A logical subset contains not only items directly related to the given query keywords but also items implicitly related to them in various ways, with the purpose of providing to the user much greater insight into the original data. In this paper, we lay the foundations for the concept of logical database subsets that are generated from précis queries under a generalized perspective that removes several restrictions of previous work. In particular, we extend the semantics of précis queries considering that they may contain multiple terms combined through the AND, OR, and NOT operators. On the basis of these extended semantics, we define the concept of a logical database subset, we identify the one that is most relevant to a given query, and we provide algorithms for its generation. Finally, we present an extensive set of experimental results that demonstrate the efficiency and benefits of our approach.
Similar content being viewed by others
References
Agrawal, R., Rantzau, R., Terzi, E.: Context-sensitive ranking. In: SIGMOD Conference, pp. 383–394 (2006)
Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)
Agrawal, S., Chaudhuri, S., Das, G., Gionis, A.: Automated ranking of database query results. In: CIDR (2003)
Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: Authority-based keyword search in databases. In: VLDB, pp. 564–575 (2004)
Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In: ICDE, pp. 431–440 (2002)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW (1998)
Callan, J., Croft, W.B., Harding, S.M.: The INQUERY retrieval system. In: 3rd International Conference on Database and Expert Systems Applications, pp. 78–83 (1992)
Chakrabarti, K., Chaudhuri, S., won Hwang, S.: Automatic categorization of query resutls. In: SIGMOD Conference, pp. 755–766 (2004)
Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic ranking of database query results. In: VLDB, pp. 888–899 (2004)
Collins A. and Quillian M. (1969). Retrieval time from semantic memory. J. Verbal Learn. Verbal Behav. 8: 240–247
Cooper W.S. (1988). Getting beyond boole. Inf. Process. Manage. 24(3): 243–248
Croft W.B. and Harper D.J. (1979). Using probabilistic models of document retrieval without relevance information. J. Doc. 35: 285–295
Cui, H., Wen, J.R., Nie, J.Y., Ma, W.Y.: Probabilistic query expansion using query logs. In: WWW (2002)
Das, G., Hristidis, V., Kapoor, N., Sudarshan, S.: Ordering the attributes of query results. In: SIGMOD Conference, pp. 395–406 (2006)
Dixon P. (2001). Basics of oracle text retrieval. IEEE Data Eng. Bull. 24(4): 11–14
Dong, X., Halevy, A.Y., Madhavan, J.: Reference reconciliation in complex information spaces. In: SIGMOD Conference, pp. 85–96 (2005)
Florescu D., Kossmann D. and Manolescu I. (2000). Integrating keyword search into XML query processing. Comput. Netw. 33: 1–6
Florescu D., Levy A.Y. and Mendelzon A.O. (1998). Database techniques for the World-Wide Web: a survey. SIGMOD Rec. 27(3): 59–74
Geerts, F., Mannila, H., Terzi, E.: Relational link-based ranking. In: VLDB, pp. 552–563 (2004)
Goldman, R., Shivakumar, N., Venkatasubramanian, S., Garcia-Molina, H.: Proximity search in databases. In: VLDB, pp. 26–37 (1998)
Graupmann, J., Schenkel, R., Weikum, G.: The SphereSearch engine for unified ranked retrieval of heterogeneous xml and web documents. In: VLDB, pp. 529–540 (2005)
Greiff W. (2002). Advances in information retrieval: recent research from the center for intelligent information retrieval. In: Bruce Croft, W. (eds) The Use of Exploratory Data Analysis in Information Retrieval Research, pp 37–70. Kluwer, Boston
Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over xml documents. In: SIGMOD Conference, pp. 16–27 (2003)
Hamilton J.R. and Nayak T.K. (2001). Microsoft SQL server full-text search. IEEE Data Eng. Bull. 24(4): 7–10
Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-style keyword search over relational databases. In: VLDB, pp. 850–861 (2003)
Hristidis V., Koudas N., Papakonstantinou Y. and Srivastava D. (2006). Keyword proximity search in XML trees. IEEE Trans. Knowl. Data Eng. 18(4): 525–539
Hristidis, V., Papakonstantinou, Y.: DISCOVER: Keyword search in relational databases. In: VLDB, pp. 670–681 (2002)
Hristidis, V., Papakonstantinou, Y., Balmin, A.: Keyword proximity search on XML graphs. In: ICDE, pp. 367–378 (2003)
Hulgeri A., Bhalotia G., Nakhe C., Chakrabarti S. and Sudarshan S. (2001). Keyword search in databases. IEEE Data Eng. Bull. 24(3): 22–32
Hwang F., Winter P. and Richards D. (1992). Steiner Tree Problem. Elsevier, Amsterdam
IBM: DB2 Text Information Extender. http://www-306.ibm.com/software/data/db2/extenders/text/
Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In: VLDB, pp. 505–516 (2005)
Kimelfeld, B., Sagiv, Y.: Efficient engines for keyword proximity search. In: WebDB, pp. 67–72 (2005)
Kimelfeld, B., Sagiv, Y.: Efficiently enumerating results of keyword search. In: DBPL, pp. 58–73 (2005)
Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In: PODS, pp. 173–182 (2006)
Kleinberg J. (1999). Authoritative sources in a hyperlinked environment. ACM J. 46(5): 604–632
Koutrika, G., Ioannidis, Y.: Personalized queries under a generalized preference model. In: ICDE, pp. 841–852 (2005)
Koutrika, G., Simitsis, A., Ioannidis, Y.: Précis: The essence of a query answer. In: ICDE, pp. 69–78 (2006)
Lawrence S. and Giles C.L. (1998). Searching the World Wide Web. Science 280: 98–100
Liu, F., Yu, C.T., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In: SIGMOD Conference, pp. 563–574 (2006)
Maier A. and Simmen D.E. (2001). DB2 optimization in support of full text search. IEEE Data Eng. Bull. 24(4): 3–6
Marchionini G. (1992). Interfaces for end-user information seeking. J. Am. Soc. Inf. Sci. 43(2): 156–163
Masermann, U., Vossen, G.: Design and implementation of a novel approach to keyword searching in relational databases. In: ADBIS-DASFAA, pp. 171–184 (2000)
McCluskey E.J. (1986). Logic Design Principles. Prentice-Hall, Englewood Cliffs
Microsoft: SQL Server 2000. http://msdn.microsoft.com/library/
Motro A. (1986). BAROQUE: A browser for relational databases. ACM Trans. Inf. Syst. 4(2): 164–181
Motro, A.: Constructing queries from tokens. In: SIGMOD, pp. 120–131 (1986)
MySQL: MySQL. http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Oracle: Oracle 9i Text. http://www.oracle.com/technology/products/text/index.html
Robertson, S.E.: Readings in information retrieval. The probability ranking principle in IR, pp. 281 – 286. Morgan Kaufmann, San Mate (1997)
Salton G. (1971). The SMART System—Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs
Sarawagi, S.: Special issue on data cleaning. IEEE Data Eng. Bull. 23 (4) (2000)
Simitsis, A., Koutrika, G.: Comprehensible answers to précis queries. In: CAiSE, pp. 142–156 (2006)
Simitsis, A., Koutrika, G.: Pattern-based query answering. In: PaRMa, pp. 41–50 (2006)
Simitsis, A., Koutrika, G., Ioannidis, Y.: Generalized précis queries for logical database subset creation. In: ICDE (2007)
Tao Y., Hristidis V., Papadias D. and Papakonstantinou Y. (2007). Branch-and-bound processing of ranked queries. Inf. Syst. 32(3): 424–445
Tsaparas, P., Palpanas, T., Kotidis, Y., Koudas, N., Srivastava, D.: Ranked join indices. In: ICDE, pp. 277–290 (2003)
Wang, Q., Nass, C., Hu, J.: Natural language query vs. keyword search: Effects of task complexity on search performance, participant perceptions, and preferences. In: INTERACT, pp. 106–116 (2005)
Wang S. and Zhang K. (2005). Searching databases with keywords. J. Comput. Sci. Technol. 20(1): 55–62
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Simitsis, A., Koutrika, G. & Ioannidis, Y. Précis: from unstructured keywords as queries to structured databases as answers. The VLDB Journal 17, 117–149 (2008). https://doi.org/10.1007/s00778-007-0075-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-007-0075-9