Accessing the Deep Web with Keywords: A Foundational Approach

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10546)


The Deep Web is constituted by data that are generated dynamically as the result of interactions with Web pages. The problem of accessing Deep Web data presents many challenges: it has been shown that answering even simple queries on such data requires the execution of recursive query plans. There is a gap between the theoretical understanding of this problem and the practical approaches to it. The main reason behind this is that the problem is to be studied by considering the database as part of the input, but queries can be processed by accessing data according to limitations, expressed as so-called access patterns. In this paper we embark on the task of closing the above gap by giving a precise definition that reflects the practical nature of accessing Deep Web data sources. In particular, we define the problem of querying Deep Web sources with keywords. We describe two scenarios: in the first, called unrestricted, there query answering algorithm has full access to the data; in the second, called restricted, the algorithm can access the data only according to the access patterns. We formalise the associated decision problem associated to that of query answering in the Deep Web, explaining its relevance in both the aforementioned scenarios. We then present some complexity results.


Unrestricted Case Initial Keyword Atomic Queries Conjunctive Queries (CQ) Abstract Domain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the EU COST Action IC1302 KEYSTONE. Andrea Calì acknowledges partial support by the EPSRC project “Logic-based Integration and Querying of Unindexed Data” (EP/E010865/1).


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Reading (1995)zbMATHGoogle Scholar
  2. 2.
    Calì, A., Martinenghi, D.: Conjunctive query containment under access limitations. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 326–340. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  3. 3.
    Calì, A., Martinenghi, D.: Querying data under access limitations. In: Proceedings of ICDE (2008)Google Scholar
  4. 4.
    Calì, A., Martinenghi, D., Razgon, I., Ugarte, M.: Querying the deep web: back to the foundations. In: Proceedings of AMW (2017). To appearGoogle Scholar
  5. 5.
    Calì, A., Razgon, I.: Complexity of conjunctive query answering under access limitations (preliminary report). In: Proceedings of SEBD (2014)Google Scholar
  6. 6.
    Chang, K.C.-C., He, B., Zhang, Z.: Toward large scale integration: building a metaquerier over databases on the web. In: Proceedings of CIDR (2005)Google Scholar
  7. 7.
    Li, C.: Computing complete answers to queries in the presence of limited access patterns. Very Large Database J. 12(3), 211–227 (2003)CrossRefGoogle Scholar
  8. 8.
    Li, C., Chang, E.: Query planning with limited source capabilities. In: Proceedings of ICDE (2000)Google Scholar
  9. 9.
    Madhavan, J., Afanasiev, L., Antova, L., Halevy, A.Y.: Harnessing the deep web: present and future. In: Proceedings of CIDR (2009)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of Computer Science and Information SystemsBirkbeck, University of LondonLondonUK
  2. 2.Oxford-Man Institute of Quantitative FinanceUniversity of OxfordOxfordUK
  3. 3.Laboratory of Web and Information TechnologiesUniversité Libre de BruxellesBrusselsBelgium

Personalised recommendations