Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data

  • Georg Gottlob
  • Christoph Koch
  • Andreas PierisEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10168)


This paper gives a short overview of specific logical approaches to data extraction, data management, and reasoning about data. In particular, we survey theoretical results and formalisms that have been obtained and used in the context of the Lixto Project at TU Wien, the DIADEM project at the University of Oxford, and the VADA project, which is currently being carried out jointly by the universities of Edinburgh, Manchester, and Oxford. We start with a formal approach to web data extraction rooted in monadic second order logic and monadic Datalog, which gave rise to the Lixto data extraction system. We then present some complexity results for monadic Datalog over trees and for XPath query evaluation. We further argue that for value creation and for ontological reasoning over data, we need existential quantifiers (or Skolem terms) in rule heads, and introduce the Datalog\(^\pm \) family. We give an overview of important members of this family and discuss related complexity issues.


Query Evaluation Input Tree Conjunctive Query Tree Automaton Existential Quantifier 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been supported by the EPSRC Programme Grant EP/M025268/ “VADA: Value Added Data Systems – Principles and Architecture”.


  1. 1.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley, Boston (1995)zbMATHGoogle Scholar
  2. 2.
    Arenas, M., Gottlob, G., Pieris, A.: Expressive languages for querying the semantic web. In: PODS, pp. 14–26 (2014)Google Scholar
  3. 3.
    Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F.: The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, Cambridge (2003)zbMATHGoogle Scholar
  4. 4.
    Baumgartner, R., Flesca, S., Gottlob, G.: Declarative information extraction, web crawling, and recursive wrapping with Lixto. In: Eiter, T., Faber, W., Truszczyński, M. (eds.) LPNMR 2001. LNCS (LNAI), vol. 2173, pp. 21–41. Springer, Heidelberg (2001). doi: 10.1007/3-540-45402-0_2 CrossRefGoogle Scholar
  5. 5.
    Baumgartner, R., Flesca, S., Gottlob, G.: The elog web extraction language. In: Nieuwenhuis, R., Voronkov, A. (eds.) LPAR 2001. LNCS (LNAI), vol. 2250, pp. 548–560. Springer, Heidelberg (2001). doi: 10.1007/3-540-45653-8_38 CrossRefGoogle Scholar
  6. 6.
    Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with lixto. In: VLDB, pp. 119–128 (2001)Google Scholar
  7. 7.
    Beeri, C., Vardi, M.Y.: The implication problem for data dependencies. In: Even, S., Kariv, O. (eds.) ICALP 1981. LNCS, vol. 115, pp. 73–85. Springer, Heidelberg (1981). doi: 10.1007/3-540-10843-2_7 CrossRefGoogle Scholar
  8. 8.
    Calì, A., Gottlob, G., Kifer, M.: Taming the infinite chase: query answering under expressive relational constraints. J. Artif. Intell. Res. 48, 115–174 (2013)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Calì, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. J. Web Sem. 14, 57–83 (2012)CrossRefGoogle Scholar
  10. 10.
    Cali, A., Gottlob, G., Lukasiewicz, T., Marnette, B., Pieris, A.: Datalog+/-: a family of logical knowledge representation and query languages for new applications. In: LICS, pp. 228–242 (2010)Google Scholar
  11. 11.
    Calì, A., Gottlob, G., Pieris, A.: Ontological query answering under expressive entity-relationship schemata. Inf. Syst. 37(4), 320–335 (2012)CrossRefGoogle Scholar
  12. 12.
    Calì, A., Gottlob, G., Pieris, A.: Towards more expressive ontology languages: the query answering problem. Artif. Intell. 193, 87–128 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Calì, A., Kifer, M.: Containment of conjunctive object meta-queries. In: VLDB, pp. 942–952 (2006)Google Scholar
  14. 14.
    Cosmadakis, S.S., Gaifman, H., Kanellakis, P.C., Vardi, M.Y.: Decidable optimization problems for database logic programs (preliminary report). In: STOC, pp. 477–490 (1988)Google Scholar
  15. 15.
    Courcelle, B.: Graph rewriting: an algebraic and logic approach. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. 2, chap. 5, pp. 193–242. Elsevier Science Publishers B.V. (1990)Google Scholar
  16. 16.
    Dantsin, E., Eiter, T., Gottlob, G., Voronkov, A.: Complexity and expressive power of logic programming. ACM Comput. Surv. 33(3), 374–425 (2001)CrossRefGoogle Scholar
  17. 17.
    Deutsch, A., Tannen, V.: Reformulation of XML queries and constraints. In: Calvanese, D., Lenzerini, M., Motwani, R. (eds.) ICDT 2003. LNCS, vol. 2572, pp. 225–241. Springer, Heidelberg (2003). doi: 10.1007/3-540-36285-1_15 CrossRefGoogle Scholar
  18. 18.
    Doner, J.: Tree acceptors and some of their applications. J. Comput. Syst. Sci. 4(5), 406–451 (1970)MathSciNetCrossRefzbMATHGoogle Scholar
  19. 19.
    Fagin, R., Kolaitis, P.G., Miller, R.J., Popa, L.: Data exchange: semantics and query answering. Theor. Comput. Sci. 336(1), 89–124 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Flum, J., Frick, M., Grohe, M.: Query evaluation via tree-decompositions. In: Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 22–38. Springer, Heidelberg (2001). doi: 10.1007/3-540-44503-X_2 CrossRefGoogle Scholar
  21. 21.
    Frick, M., Grohe, M., Koch, C.: Query evaluation on compressed trees. In: LICS, pp. 22–25 (2003)Google Scholar
  22. 22.
    Furche, T., Gottlob, G., Grasso, G., Guo, X., Orsi, G., Schallhart, C., Wang, C.: DIADEM: thousands of websites to a single database. PVLDB 7(14), 1845–1856 (2014)Google Scholar
  23. 23.
    Furche, T., Gottlob, G., Libkin, L., Orsi, G., Paton, N.W.: Data wrangling for big data: challenges and opportunities. In: EDBT, pp. 473–478 (2016)Google Scholar
  24. 24.
    Furche, T., Linse, B., Bry, F., Plexousakis, D., Gottlob, G.: RDF querying: language constructs and evaluation methods compared. In: Barahona, P., Bry, F., Franconi, E., Henze, N., Sattler, U. (eds.) Reasoning Web 2006. LNCS, vol. 4126, pp. 1–52. Springer, Heidelberg (2006). doi: 10.1007/11837787_1 CrossRefGoogle Scholar
  25. 25.
    Gottlob, G., Kikot, S., Kontchakov, R., Podolskii, V.V., Schwentick, T., Zakharyaschev, M.: The price of query rewriting in ontology-based data access. Artif. Intell. 213, 42–59 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Gottlob, G., Koch, C.: Monadic queries over tree-structured data. In: LICS, pp. 189–202 (2002)Google Scholar
  27. 27.
    Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for web information extraction. J. ACM 51(1), 74–113 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Gottlob, G., Koch, C.: A formal comparison of visual web wrapper generators. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 30–48. Springer, Heidelberg (2006). doi: 10.1007/11611257_3 CrossRefGoogle Scholar
  29. 29.
    Gottlob, G., Koch, C., Baumgartner, R., Herzog, M., Flesca, S.: The Lixto data extraction project: back and forth between theory and practice. In: PODS, pp. 1–12 (2004)Google Scholar
  30. 30.
    Gottlob, G., Koch, C., Pichler, R.: Efficient algorithms for processing XPath queries. In: VLDB, pp. 95–106 (2002)Google Scholar
  31. 31.
    Gottlob, G., Koch, C., Pichler, R.: The complexity of XPath query evaluation. In: PODS, pp. 179–190 (2003)Google Scholar
  32. 32.
    Gottlob, G., Koch, C., Schulz, K.U.: Conjunctive queries over trees. In: PODS, pp. 189–200 (2004)Google Scholar
  33. 33.
    Gottlob, G., Manna, M., Pieris, A.: Polynomial rewritings for linear existential rules. In: IJCAI, pp. 2992–2998 (2015)Google Scholar
  34. 34.
    Gottlob, G., Orsi, G., Pieris, A.: Query rewriting and optimization for ontological databases. ACM Trans. Database Syst. 39(3), 25:1–25:46 (2014)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Gottlob, G., Orsi, G., Pieris, A.: Consistency checking of re-engineered UML class diagrams via Datalog+/-. In: RuleML, pp. 35–53 (2015)Google Scholar
  36. 36.
    Gottlob, G., Pieris, A.: Beyond SPARQL under OWL 2 QL entailment regime: rules to the rescue. In: IJCAI, pp. 2999–3007 (2015)Google Scholar
  37. 37.
    Gottlob, G., Rudolph, S., Simkus, M.: Expressiveness of guarded existential rule languages. In: PODS, pp. 27–38 (2014)Google Scholar
  38. 38.
    Gottlob, G., Schwentick, T.: Rewriting ontological queries into small nonrecursive datalog programs. In: KR (2012)Google Scholar
  39. 39.
    Grau, B.C., Horrocks, I., Krötzsch, M., Kupke, C., Magka, D., Motik, B., Wang, Z.: Acyclicity conditions and their application to query answering in description logics. In: KR (2012)Google Scholar
  40. 40.
    Greenlaw, R., Hoover, H.J., Ruzzo, W.L.: Limits to Parallel Computation: P-Completeness Theory. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
  41. 41.
    Johnson, D.S., Klug, A.C.: Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comput. Syst. Sci. 28(1), 167–189 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  42. 42.
    Kifer, M., Lausen, G., Wu, J.: Logical foundations of object-oriented and frame-based languages. J. ACM 42, 741–843 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  43. 43.
    Koch, C.: Efficient processing of expressive node-selecting queries on XML data in secondary storage: a tree automata-based approach. In: VLDB, pp. 249–260 (2003)Google Scholar
  44. 44.
    Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S.: Debye - data extraction by example. Data Knowl. Eng. 40(2), 121–154 (2002)CrossRefzbMATHGoogle Scholar
  45. 45.
    Liu, L., Pu, C., Han, W.: XWRAP: An XML-enabled wrapper construction system for web information sources. In: ICDE, pp. 611–621 (2000)Google Scholar
  46. 46.
    Ludäscher, B., Himmeröder, R., Lausen, G., May, W., Schlepphorst, C.: Managing semistructured data with FLORID: a deductive object-oriented perspective. Inf. Syst. 23(8), 589–613 (1998)CrossRefGoogle Scholar
  47. 47.
    Lukasiewicz, T., Martinez, M.V., Pieris, A., Simari, G.I.: From classical to consistent query answering under existential rules. In: AAAI, pp. 1546–1552 (2015)Google Scholar
  48. 48.
    Marnette, B.: Generalized schema-mappings: from termination to tractability. In: PODS, pp. 13–22 (2009)Google Scholar
  49. 49.
    Meuss, H., Schulz, K.U., Bry, F.: Towards aggregated answers for semistructured data. In: Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 346–360. Springer, Heidelberg (2001). doi: 10.1007/3-540-44503-X_22 CrossRefGoogle Scholar
  50. 50.
    Milani, M., Bertossi, L.: Ontology-based multidimensional contexts with applications to quality data specification and extraction. In: Bassiliades, N., Gottlob, G., Sadri, F., Paschke, A., Roman, D. (eds.) RuleML 2015. LNCS, vol. 9202, pp. 277–293. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-21542-6_18 CrossRefGoogle Scholar
  51. 51.
    Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The clio project: managing heterogeneity. SIGMOD Rec. 30(1), 78–83 (2001)CrossRefGoogle Scholar
  52. 52.
    Minoux, M.: LTUR: a simplified linear-time unit resolution algorithm for horn formulae and computer implementation. Inf. Process. Lett. 29(1), 1–12 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    Neven, F., den Bussche, J.V.: Expressiveness of structured document query languages based on attribute grammars. J. ACM 49(1), 56–100 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  54. 54.
    Neven, F., Schwentick, T.: Query automata over finite trees. Theor. Comput. Sci. 275(1–2), 633–674 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  55. 55.
    Papakonstantinou, Y., Gupta, A., Garcia-Molina, H., Ullman, J.: A query translation scheme for rapid implementation of wrappers. In: Ling, T.W., Mendelzon, A.O., Vieille, L. (eds.) DOOD 1995. LNCS, vol. 1013, pp. 161–186. Springer, Heidelberg (1995). doi: 10.1007/3-540-60608-4_40 CrossRefGoogle Scholar
  56. 56.
    Sahuguet, A., Azavant, F.: Building intelligent web applications using lightweight wrappers. Data Knowl. Eng. 36(3), 283–316 (2001)CrossRefzbMATHGoogle Scholar
  57. 57.
    Seidl, H., Schwentick, T., Muscholl, A.: Numerical document queries. In: PODS, pp. 155–166 (2003)Google Scholar
  58. 58.
    Thatcher, J.W., Wright, J.B.: Generalized finite automata theory with an application to a decision problem of second-order logic. Math. Syst. Theory 2(1), 57–81 (1968)MathSciNetCrossRefzbMATHGoogle Scholar
  59. 59.
    Thomas, W.: Languages, automata, and logic. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 3, pp. 389–455. Springer, Heidelberg (1997). Chapter 7CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.University of OxfordOxfordUK
  2. 2.École Polytechnique Fédérale de LausanneLausanneSwitzerland
  3. 3.University of EdinburghEdinburghScotland

Personalised recommendations