Abstract
Databases and documents are commonly isolated from each other, controlled by Database Management Systems (DBMS) and Information Retrieval Systems (IRS), respectively. However, both systems are likely to store data about the same entities, a strong argument in favor of their integration. We propose a DBMS-IRS integration approach that uses terms in DBMS queries as keywords to IRS searches, retrieving documents strongly related to the queries. The IRS keywords are built “expanding” an initial set of user-provided keywords, with top-ranked terms found in a query result: the terms are ranked based on a measure of term diffusion over the query result. Our experiments show the effectiveness of the approach in two different domains, in comparison to other DBMS-IRS integration methods, as well as to other term-ranking methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Weikum, G.: DB & IR: both sides now. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data - SIGMOD 2007, pp. 25–30. ACM, New York (2007)
Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping? In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research - CIDR 2005, pp. 1–12. VLDB Foundation (2005)
Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems - PODS 2006, pp. 1–9. ACM, New York (2006)
Mirza, H.T., Chen, L., Chen, G.: Practicability of dataspace systems. International Journal of Digital Content Technology and its Applications 4(3), 233–243 (2010)
Cafarella, M.J., Christopher, R., Suciu, D., Etzioni, O., Banko, M.: Structured querying of Web text: a technical challenge. In: Proceedings of the 3rd Biennial Conference on Innovative Data Systems Research - CIDR 2007, pp. 225–234. VLDB Foundation (2007)
Jain, A., Doan, A., Gravano, L.: SQL queries over unstructured text databases. In: Proceedings of the 23rd IEEE International Conference on Data Engineering - ICDE 2007, pp. 1255–1257. IEEE Computer Society, Washington-DC (2007)
Yu, J.X., Qin, L., Chang, L.: Keyword search in relational databases: a survey. IEEE Data Eng. Bull. 33(1), 67–78 (2010)
Luk, R.W.P., Leong, H.V., Dillon, T.S., Chan, A.T.S., Croft, W.B.: A survey in indexing and searching XML documents. J. Assoc. Inf. Sci. Technol. 53(6), 415–437 (2002)
Carpineto, C., Romano, G.: A survey of automatic query expansion in information retrieval. ACM Comp. Surv. 44(1), 1–50 (2012)
Liu, J., Dong, X., Halevy, A.: Answering structured queries on unstructured data. In: Proceedings of the 9th International Workshop on the Web and Databases - WebDB 2006, Chicago, USA, pp. 25–30 (2006)
Roy, P., Mohania, M., Bamba, B., Raman, S.: Towards automatic association of relevant unstructured content with structured query results. In: Proceedings of the fourteenth ACM Conference on Information and Knowledge Management - CIKM 2005, pp. 405–412. ACM, New York (2005)
Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. on Inf. Syst. 20(4), 357–389 (2002)
Carpineto, C., Mori, R., Romano, G., Bigi, B.: An information-theoretic approach to automatic query expansion. ACM Trans. on Inf. Syst. 19(1), 1–27 (2001)
Lavrenko, V., Croft, W.B.: Relevance-based language models. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR 2001, pp. 120–127. ACM, New York (2001)
Roy, P., Mohania, M.: SCORE: symbiotic context oriented information retrieval. In: Pro-ceedings of the Joint 9th Asia-Pacific Web and 8th International Conference on Web-Age Information Management Conference on Advances in Data and Web Management - AP-Web/WAIM 2007, Huang Shan, China, pp. 30–38 (2007)
Dong, X.L., Halevy, A.: A platform for personal information management and integration. In: Proceedings of the Second Biennial Conference on Innovative Data Systems Research - CIDR 2005, pp. 119–130. VLDB Foundation (2005)
Lavrenko, V., Allan, J.: Real-time query expansion in relevance models. Internal Report 473, Center for Intelligent Information Retrieval - CIIR, University of Massachusetts (2006)
Fox, C.: Lexical analysis and stoplists. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms, pp. 102–130. Prentice Hall, USA (1992)
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1998, pp. 206–214. ACM, New York (1998)
Rocchio, J.J.: Relevance feedback in information retrieval. In: Salton, G. (ed.) SMART Retrieval System - Experiments in Automatic Document Processing, pp. 313–323. Prentice Hall, USA (1971)
Sanderson, M.: Test collection based evaluation of information retrieval systems. Found. Trend. Inf. Ret. 4(4), 247–375 (2010)
Lalmas, M., Tombros, A.: INEX 2002 - 2006: Understanding XML retrieval evaluation. In: Thanos, C., Borri, F., Candela, L. (eds.) Digital Libraries: Research and Development. LNCS, vol. 4877, pp. 187–196. Springer, Heidelberg (2007)
Wang, Q., Ramírez, G., Marx, M., Theobald, M., Kamps, J.: Overview of the INEX 2011 Data-Centric track. In: Geva, S., Kamps, J., Schenkel, R. (eds.) INEX 2011. LNCS, vol. 7424, pp. 118–137. Springer, Heidelberg (2012)
Bellot, P., et al.: Overview of INEX 2013. In: Forner, P., Müller, H., Paredes, R., Rosso, P., Stein, B. (eds.) CLEF 2013. LNCS, vol. 8138, pp. 269–281. Springer, Heidelberg (2013)
Voorhees, E.M.: The TREC Robust Retrieval Track. SIGIR Forum 39(1), 11–20 (2005)
Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia. Artif. Intell. 194, 28–61 (2013)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D.: DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web J. (in press)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Catão, V.S., Sampaio, M.C., Schiel, U. (2015). Retrieving Documents Related to Database Queries. In: Italiano, G.F., Margaria-Steffen, T., Pokorný, J., Quisquater, JJ., Wattenhofer, R. (eds) SOFSEM 2015: Theory and Practice of Computer Science. SOFSEM 2015. Lecture Notes in Computer Science, vol 8939. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-46078-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-662-46078-8_41
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-46077-1
Online ISBN: 978-3-662-46078-8
eBook Packages: Computer ScienceComputer Science (R0)