Encyclopedia of Database Systems

2018 Edition
| Editors: Ling Liu, M. Tamer Özsu

Wrapper Maintenance

  • Kristina Lerman
  • Craig A. Knoblock
Reference work entry
DOI: https://doi.org/10.1007/978-1-4614-8265-9_1158

Synonyms

Wrapper repair; Wrapper verification and reinduction

Definition

A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either grammar-based or “landmark”-based extraction rules that wrappers use to extract data. As a consequence, even slight changes in the page layout can break the wrapper and prevent it from extracting data correctly. Wrapper maintenance is a composite task that (i) verifies that the wrapper continues to extract data correctly from a source, and (ii) repairs the wrapper so that it works on the changed pages.

Historical Background

Wrapper induction algorithms [3, 6, 11] exploit regularities in the page layout to find a set of...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Chidlovskii B. Automatic repairing of web wrappers by combining redundant views. In: Proceedings of the 14th IEEE International Conference Tools with Artificial Intelligence; 2002. p. 399–406.Google Scholar
  2. 2.
    Crescenzi V, Mecca G. Automatic information extraction from large websites. J ACM. 2004;51(5):731–79.MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Hsu C-N, Dung M-T. Generating finite-state transducers for semi-structured data extraction from the web. J Inf Syst. 1998;23(8):521–38.CrossRefGoogle Scholar
  4. 4.
    Kushmerick N. Regression testing for wrapper maintenance. In: Proceedings of the 14th National Conference on AI; 1999. p. 74–9.Google Scholar
  5. 5.
    Kushmerick N. Wrapper verification. World Wide Web J. 2000;3(2):79–94.zbMATHCrossRefGoogle Scholar
  6. 6.
    Kushmerick N, Weld DS, Doorenbos RB. Wrapper induction for information extraction. In: Proceedings of the 15th International Joint Conference on AI; 1997. p. 729–37.Google Scholar
  7. 7.
    Lerman K, Gazen C, Minton S, Knoblock CA. Populating the semantic web. In: Proceedings of the AAAI Workshop on Advances in Text Extraction and Mining; 2004.Google Scholar
  8. 8.
    Lerman K, Minton S. Learning the common structure of data. In: Proceedings of the 12th National Conference on AI; 2000. p. 609–14.Google Scholar
  9. 9.
    Lerman K, Minton S, Knoblock C. Wrapper maintenance: a machine learning approach. J Artif Intell Res. 2003;18:149–81.zbMATHCrossRefGoogle Scholar
  10. 10.
    Meng X, Hu D, Li C. Schema-guided wrapper maintenance. In: Proceedings of the 2003 Conference on Web Information and Data Management; 2003. p. 1–8.Google Scholar
  11. 11.
    Muslea I, Minton S, Knoblock CA. Hierarchical wrapper induction for semistructured information sources. Auton Agent Multi Agent Syst. 2001;4(1–2):93–114.CrossRefGoogle Scholar
  12. 12.
    Raposo J, Pan A, Alvarez M, Hidalgo J. Automatically generating labeled examples for web wrapper maintenance. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence; 2005. p. 250–6.Google Scholar
  13. 13.
    Raposo J, Pan A, Álvarez M, Hidalgo J. Automatically maintaining wrappers for semi-structured web sources. Data Knowl Eng. 2007;61(2):331–58.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of Southern California, Marina del ReyLos AngelesUSA

Section editors and affiliations

  • Georg Gottlob
    • 1
  1. 1.Computing Lab.Oxford Univ.OxfordUK