Encyclopedia of Database Systems

2009 Edition

Logical Foundations of Web Data Extraction

  • Christoph Koch
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-39940-9_1155


Several wrapper programming languages for extracting information from Web pages have been based on logic, specifically on fragments of datalog. This entry shows how logical languages can be used for Web information extraction, and surveys expressiveness and complexity aspects of a foundational logical wrapping language, monadic datalog.

Historical Background

A substantial amount of research has studied the problem of learning wrapper programs from examples (see Wrapper Induction). Unfortunately, it is now known that the expressive power of learnable wrappers is fundamentally limited. This has motivated further work on visual wrapper programming languages, which simplify and speed up wrapper definition. Visual wrapping is now supported by several implemented systems (cf. XWrap [3] and W4F [4]; Lixto [1], a commercial product). The Lixto project was the first to emphasize and study expressiveness of visual wrapper languages. Its approach is based on a datalog-like language,...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Baumgartner R., Flesca S., and Gottlob G. Visual web information extraction with Lixto. In Proc. 27th Int. Conf. on Very Large Data Bases, 2001.Google Scholar
  2. 2.
    Gottlob G. and Koch C. Monadic datalog and the expressive power of web information extraction languages. J. ACM, 51(1):74–113, 2004.CrossRefMathSciNetGoogle Scholar
  3. 3.
    Liu L., Pu C., and Han W. XWRAP: An XML-enabled wrapper construction system for web information sources. In Proc. 16th IEEE Int. Conf. on Data Engineering, pp. 611–621.2000,Google Scholar
  4. 4.
    Sahuguet A. and Azavant F. Building intelligent web applications using lightweight wrappers. Data Knowl. Eng., 36(3):283–316, 2001.zbMATHCrossRefGoogle Scholar
  5. 5.
    Shen W., Doan A., Naughton J.F., and Ramakrishnan R. Declarative information extraction using datalog with embedded extraction predicates. In Proc. 33rd Int. Conf. on Very Large Data Bases, 2007, pp. 1033–1044.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • Christoph Koch
    • 1
  1. 1.Cornell University, IthacaNYUSA