Abstract
We present an inductive logic programming bottom-up learning algorithm (BFOIL) for synthesizing logic programs for multi-slot information extraction from hypertext documents. BFOIL learns from positive examples only and uses a logical representation for hypertext documents based on the document object model (DOM). We briefly discuss several BFOIL refinements and show very promising results of our IE system LIPX in comparison to state of the art IE systems.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Califf, M.E.: Relational Learning Techniques for Natural Language Information Extraction. PhD thesis, University of Texas at Austin (August. 1998)
Ciravegna, F.: Learning to Tag for Information Extraction from Text. In: Workshop Machine Learning for Information Extraction, European Conference on Artifical Intelligence ECCAI, Berlin, Germany (August 2000)
Cohen, W., Hurst, M., Jensen, L.S.: A flexible learning system for wrapping tables and lists in html documents. In: The Eleventh International World Wide Web Conference WWW (2002)
W3C, Document Object Model (DOM) Level 2 Core Specification (2000), Version 1.0, http://www.w3.org/TR/DOM-Level-2-Core/
Freitag, D.: Machine Learning for Information Extraction in Informal Domains. PhDthesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA (November 1998)
Freitag, D., Kushmerick, N.: Boosted Wrapper Induction. In: Proceedings of the Seventh National Conference on Artificial, Austin, Texas, July 30 - August 3, pp. 577–583 (2000)
Hsu, C.-N., Chang, C.-C.: Finite-State Transducers for Semi-Structured Text Mining. In: Workshop on Text Mining IJCAI (1999)
Junker, M., Sintek, M., Rinck, M.: Learning for Text Categorization and Information Extraction with ILP. In: Proc. Workshop on Learning Language in Logic, Bled, Slovenia (June 1999)
Kushmerick, N.: Wrapper Induction for Information Extraction. PhD thesis, University of Washington (1997)
Kushmerick, N., Thomas, B.: Intelligent Information Agents R&D in Europe. In: An AgentLink perspective Adaptive Information Extraction: A Core Technology for Information Agents, Springer, Heidelberg (2002)
Lloyd, J.: Foundations of Logic Programming, 2nd edn. Springer, Heidelberg (1987)
Muggleton, S., Raedt, L.D.: Inductive logic programming: Theory and methods. Journal of Logic Programming (1994)
Muslea, I.: The RISE Repository (1999), http://www.isi.edu/~muslea/RISE/
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Etzioni, O., Müller, J.P., Bradshaw, J.M. (eds.) Proceedings of the Third International Conference on Autonomous Agents (Agents 1999), Seattle, WA, USA, pp. 190–197. ACM Press, New York (1999)
Plotkin, G.: A note on inductive generalization. Machine Intelligence (5), 153–163 (1970)
Quinlan, J.R.: Learning logical definitions from relations. Machine Learning 5, 239–266 (1990)
Reiter, R.: On Closed World Data Bases. In: Gallaire, H., Minker, J. (eds.) Logic and Data Bases, Plenum Press, New York (1978)
Scott, M.L.: Dewey Decimal Classification: A Study Manual and Number Building Guide. Libraries Unlimited (1998)
W3C, xpath specification (1999), http://www.w3.org/TR/xpath
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thomas, B. (2003). Bottom-Up Learning of Logic Programs for Information Extraction from Hypertext Documents. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds) Knowledge Discovery in Databases: PKDD 2003. PKDD 2003. Lecture Notes in Computer Science(), vol 2838. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39804-2_39
Download citation
DOI: https://doi.org/10.1007/978-3-540-39804-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20085-7
Online ISBN: 978-3-540-39804-2
eBook Packages: Springer Book Archive