Abstract
Traditional information search in which queries are posed against a known and rigid schema over a structured database is shifting toward a Web scenario in which exposed schemas are vague or absent and data come from heterogeneous sources. In this framework, query answering cannot be precise and needs to be relaxed, with the goal of matching user requests with accessible data. In this paper, we propose a logical model and a class of abstract query languages as a foundation for querying relational data sets with vague schemas. Our approach relies on the availability of taxonomies, that is, simple classifications of terms arranged in a hierarchical structure. The model is a natural extension of the relational model in which data domains are organized in hierarchies, according to different levels of generalization between terms. We first propose a conservative extension of the relational algebra for this model in which special operators allow the specification of relaxed queries over vaguely structured information. We also study equivalence and rewriting properties of the algebra that can be used for query optimization. We then illustrate a logic-based query language that can provide a basis for expressing relaxed queries in a declarative way. We finally investigate the expressive power of the proposed query languages and the independence of the taxonomy in this context.
Similar content being viewed by others
Notes
“MOKA: an infrastructure for public transit integrated car pooling”, a project funded by Politecnico di Milano. Website: http://moka.necst.it/app/index.html
GenData 2020 (“Data-Driven Genomic Computing”) is a project funded by MIUR (Italian Ministry of Education, University and Research) involving a large consortium of Italian universities: see http://gendata.weebly.com/
References
Abiteboul, S., Beeri, C.: The power of languages for the manipulation of complex values. VLDB J. 4(4), 727–794 (1995)
Agrawal, R., Wimmers, E.L.: A framework for expressing and combining preferences. In: Proceedings of SIGMOD, pp. 297–306 (2000)
Bellahsene, Z., Bonifati, A., Rahm, E. (eds.) Schema Matching and Mapping. Springer, Berlin (2011)
Andreasen, T., Bulskov, H.: Conceptual querying through ontologies. Fuzzy Sets Syst. 160(15), 2159–2172 (2009)
Amer-Yahia, S., Lakshmanan, L.V.S., Pandit, S.: Flexpath: flexible structure and full-text querying for XML. In: Proceedings of SIGMOD, pp. 83–94 (2004)
Amer-Yahia, S., Curtmola, E., Deutsch, A.: Flexible and efficient XML search with complex full-text predicates. In: Proceedings of SIGMOD, pp. 575–586 (2006)
Arvanitis, A., Koutrika, G.: PrefDB: bringing preferences closer to the DBMS. In: Proceedings of SIGMOD, pp. 665–668 (2012)
Balke, W.-T., Wagner, M.: Through different eyes: assessing multiple conceptual views for querying web services. In Proceedings of WWW, pp. 196–205 (2004)
Bernstein, A., Kiefer, C.: Imprecise RDQL: towards generic retrieval in ontologies using similarity joins. In: Proceedings of SAC, pp. 1684–1689 (2006)
Bhogal, J., MacFarlane, A., Smith, P.: A review of ontology based query expansion. Inf. Process. Manag. 43(4), 866–886 (2007)
Bolchini, C., Curino, C., Orsi, G., Quintarelli, E., Rossato, R., Schreiber, F., Tanca, L.: And what can context do for data? Commun. ACM 52(11), 136–140 (2009)
Broder, A.Z., Fontoura, M., Josifovski, V., Riedel, L.: A semantic approach to contextual advertising. In: Proceedings of SIGIR, pp. 559–566 (2007)
Bulskov, H., Knappe, R., Andreasen, T.: On querying ontologies and databases. In: Proceedings of FQAS, pp. 191–202 (2004)
Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodriguez-Muro, M., Rosati, R., Ruzzi, M., Fabio Savo, D.: The MASTRO system for ontology-based data access. Semantic Web 2(1), 43–53 (2011)
Catallo, I., Ciceri, E., Fraternali, P., Martinenghi, D., Tagliasacchi, M.: Top-k diversity queries over bounded regions. ACM Trans. Database Syst. 38(2): art. 10 (2013)
Chen, Y.-Y., Suel, T., Markowetz, A.: Efficient query processing in geographic web search engines. In: Proceedings of SIGMOD, pp. 277–288 (2006)
Chomicki, J.: Preference formulas in relational queries. ACM Trans. Database Syst. 28(4), 427–466 (2003)
Ciaccia, P., Torlone, R.: Modeling the propagation of user preferences. In: Proceedings of ER, pp. 304–317 (2011)
Codd, E.F.: Relational completeness of data base sublanguages. In: Rustin, R. (ed.) Database Systems Prentice Hall and IBM Research Report RJ 987, pp. 65–98 (1972)
Dolog, P., Stuckenschmidt, H., Wache, H., Diederich, J.: Relaxing RDF queries based on user and domain preferences. J. Intell. Inf. Syst. 33(3), 239–260 (2009)
Dong, X., Halevy, A.Y.: Malleable schemas: a preliminary report. In: Proceedings of WebDB, pp. 139–144 (2005)
Elbassuoni, S., Ramanath, M., Schenkel, R., Weikum, G.: Searching RDF graphs with SPARQL and keywords. IEEE Data Eng. Bull. 33(1), 16–24 (2010)
Elbassuoni, S., Ramanath, M., Weikum, G.: Query relaxation for entity-relationship search. In: Proceedings of ESWC, pp. 62–76 (2011)
Escobar-Molano, M., Hull, R., Jacobs, D.: Safety and translation of calculus queries with scalar functions. In: Proceedings of PODS, pp. 253–264 (1993)
Fagin, R., Guha, R.V., Kumar, R., Novak, J., Sivakumar, D., Tomkins, A.: Multi-structural databases. In: Proceedings of PODS, pp. 184–195 (2005)
Fagin, R., Kolaitis, P.G., Guha, R.V., Kumar, R., Novak, J., Sivakumar, D., Tomkins, A.: Efficient implementation of large-scale multi-structural databases. In: Proceedings of SIGMOD, pp. 958–969 (2005)
Fontoura, M., Josifovski, V., Kumar, R., Olston, C., Tomkins, A., Vassilvitskii, S.: Relaxation in text search using taxonomies. Proc. VLDB 1(1), 672–683 (2008)
Gaasterland, T., Godfrey, P., Minker, J.: Relaxation as a platform for cooperative answering. J. Intell. Inf. Syst. 1(3/4), 293–321 (1992)
Hurtado, C.A., Poulovassilis, A., Wood, P.T.: Query relaxation in RDF. J. Data Semant. 10, 31–61 (2008)
Ilyas, I.F., Beskales, G., Soliman, M.A.: A survey of top-k query processing techniques in relational database systems. ACM Comput. Surv. 40(4). Artcle 11 (2008)
Kanza, Y., Sagiv, Y.: Flexible queries over semistructured data. In: Proceedings of PODS, pp. 40–51 (2001)
Kießling, W.: Foundations of preference in database systems. In: Proceedings of VLDB, pp. 311–322 (2005)
Koudas, N., Li, C., Tung, A.K.H., Vernica, R.: Relaxing join and selection queries. In: Proceedings of VLDB, pp. 199–210 (2006)
Koutrika, G., Ioannidis, Y.E.: Personalization of queries in database systems. In: Proceedings of ICDE, pp. 597–608 (2004)
Li, Y., Yang, H., Jagadish, H.V.: NaLIX: A generic natural language search environment for XML data. ACM Trans Database Syst. 32(4): art. 30 (2007)
Liu, C., Li, J., Xu Yu, J.: NaLIX: adaptive relaxation for querying heterogeneous XML data sources. Inf. Syst. 35(6), 688–707 (2010)
Martinenghi, D., Tagliasacchi, M.: Proximity measures for rank join. ACM Trans. Database Syst. 37(1): art. 2 (2012)
Martinenghi, D., Torlone, R.: Querying databases with taxonomies. In: Proceedings of ER, pp. 377–390 (2010)
Meng, X., Ma, Z.M., Yan, L.: Answering approximate queries over autonomous web databases. In: Proceedings of WWW, pp. 1021–1030 (2009)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)
Stefanidis, K., Koutrika, G., Pitoura, E.: A survey on representation, composition and application of preferences in database systems. ACM Trans. Database Syst. 36(3): art. 19 (2011)
Zhou, X., Gaugaz, J., Balke, W., Nejdl, W.: Query relaxation using malleable schemas. In: Proceedings of SIGMOD, pp. 545–556 (2007)
Acknowledgments
The authors acknowledge support from the EC’s FP7 “CUbRIK” project and from the Italian “GenData” PRIN project.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Martinenghi, D., Torlone, R. Taxonomy-based relaxation of query answering in relational databases. The VLDB Journal 23, 747–769 (2014). https://doi.org/10.1007/s00778-013-0350-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-013-0350-x