Definition
Several wrapper programming languages for extracting information from Web pages have been based on logic, specifically on fragments of datalog. This entry shows how logical languages can be used for Web information extraction, and surveys expressiveness and complexity aspects of a foundational logical wrapping language, monadic datalog.
Historical Background
A substantial amount of research has studied the problem of learning wrapper programs from examples (see Wrapper Induction). Unfortunately, it is now known that the expressive power of learnable wrappers is fundamentally limited. This has motivated further work on visual wrapper programming languages, which simplify and speed up wrapper definition. Visual wrapping is now supported by several implemented systems (cf. XWrap [3] and W4F [4]; Lixto [1], a commercial product). The Lixto project was the first to emphasize and study expressiveness of visual wrapper languages. Its approach is based on a datalog-like language,...
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsRecommended Reading
Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001.
Gottlob G, Koch C. Monadic datalog and the expressive power of web information extraction languages. J ACM. 2004;51(1):74–113.
Liu L, Pu C, Han W. XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th IEEE International Conference on Data Engineering; 2000. p. 611–21.
Sahuguet A, Azavant F. Building intelligent web applications using lightweight wrappers. Data Knowl Eng. 2001;36(3):283–316.
Shen W, Doan A, Naughton JF, Ramakrishnan R. Declarative information extraction using datalog with embedded extraction predicates. In: Proceedings of the 33rd International Conference on Very Large Data Bases; 2007. p. 1033–44.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Koch, C. (2018). Logical Foundations of Web Data Extraction. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1155
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1155
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering