Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Wrapper Induction

  • Max Goebel
  • Michal Ceresna
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1160

Synonyms

Information extraction; Wrapper generation

Definition

Wrapper induction (or query induction) is a subfield of wrapper generation, which itself belongs to the broader field of information extraction (IE). In IE, wrappers transform unstructured input into structured output formats, and a wrapper generation system describes the transformation rules involved in such transformations. Wrapper induction is a solution to wrapper generation where transformation rules are learned from examples and counterexamples (inductive learning). The induced wrapper subsequently is applied to unseen input documents to collect further label relations of interest. To ease annotation of examples by the user, the learning framework is often implemented within a visual annotation environment, where the user selects and deselects elements visually.

The term “wrapper induction” was first conceptualized by Nicholas Kushmerick in his influential Ph.D thesis in 1997 in the context of semi-structured Web...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Adelberg B. NoDoSE: a tool for semi-automatically extracting structured and semistructured data from text documents. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 1998. p. 283–94.Google Scholar
  2. 2.
    Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.Google Scholar
  3. 3.
    Carme J, Ceresna M, Goebel M. Web wrapper specification using compound filter learning. In: Proceedings of the IADIS International Conference on WWW/Internet 2006; 2006.Google Scholar
  4. 4.
    Chang CH, Kuo SC. OLERA: semisupervised web-data extraction with visual support. IEEE Intell Syst. 2004;19(6):56–64.CrossRefGoogle Scholar
  5. 5.
    Finn A, Kushmerick N. Active learning selection strategies for information extraction. In: Proceedings of the Workshop on Adaptative Text Extraction and Mining; 2003.Google Scholar
  6. 6.
    Freitag D, Kushmerick N. Boosted wrapper induction. In: Proceedings of the 12th National Conference on AI; 2000. p. 577–83.Google Scholar
  7. 7.
    Hsu CN, Dung MT. Generating finite-state transducers for semi-structured data extraction from the web. Inf Syst. 1998;23(8):521–38.CrossRefGoogle Scholar
  8. 8.
    Irmak U, Suel T. Interactive wrapper generation with minimal user effort. In: Proceedings of the 15th International World Wide Web Conference; 2006. p. 553–63.Google Scholar
  9. 9.
    Knoblock CA, Lerman K, Minton S, Muslea I. Accurately and reliably extracting data from the web: a machine learning approach. Q Bull, IEEE TC Data Eng. 2000;23(4):33–41.Google Scholar
  10. 10.
    Kushmerick N. Wrapper induction for information extraction. PhD thesis, University of Washington; 1997.Google Scholar
  11. 11.
    Kushmerick N. Wrapper induction: efficiency and expressiveness. Artif Intell. 2000;118(1–2):15–68.MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    Laender AHF, Ribeiro-Neto B, da Silva AS. DEByE – date extraction by example. Data Knowl Eng. 2002;40(2):121–54.zbMATHCrossRefGoogle Scholar
  13. 13.
    Liu L, Pu C, Han W. XWRAP: an XML-enabled wrapper construction system for web information sources. In: Proceedings of the 16th International Conference on Data Engineering; 2000. p. 611–21.Google Scholar
  14. 14.
    Muslea I, Minton S, Knoblock C. STALKER: learning extraction rules for semistructured, web-based information sources. 1998. URL http://citeseer.ist.psu.edu/muslea98stalker.html
  15. 15.
    Muslea I, Minton S, Knoblock CA. Selective sampling with redundant views. In: Proceedings of the 12th National Conference on AI; 2000. p. 621–26.Google Scholar
  16. 16.
    Sahuguet A, Azavant F. WysiWyg web wrapper factory (W4F). 2001. URL http://citeseer.ist.psu.edu/553711.html; http://www.ai.mit.edu/people/jimmylin/papers/Sahuguet99.ps
  17. 17.
    Seymore K, McCallum A, Rosenfeld R. Learning hidden Markov model structure for information extraction. In: Proceedings of the AAAI 99 Workshop on Machine Learning for Information Extraction; 1999.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Vienna University of TechnologyViennaAustria
  2. 2.Lixto Software GmbHViennaAustria

Section editors and affiliations

  • Georg Gottlob
    • 1
  1. 1.Computing Lab.Oxford Univ.OxfordUK