Skip to main content

Wrapper Maintenance

  • Reference work entry
  • First Online:
  • 27 Accesses

Synonyms

Wrapper repair; Wrapper verification and reinduction

Definition

A Web wrapper is a software application that extracts information from a semi-structured source and converts it to a structured format. While semi-structured sources, such as Web pages, contain no explicitly specified schema, they do have an implicit grammar that can be used to identify relevant information in the document. A wrapper learning system analyzes page layout to generate either grammar-based or “landmark”-based extraction rules that wrappers use to extract data. As a consequence, even slight changes in the page layout can break the wrapper and prevent it from extracting data correctly. Wrapper maintenance is a composite task that (i) verifies that the wrapper continues to extract data correctly from a source, and (ii) repairs the wrapper so that it works on the changed pages.

Historical Background

Wrapper induction algorithms [3, 6, 11] exploit regularities in the page layout to find a set of...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Chidlovskii B. Automatic repairing of web wrappers by combining redundant views. In: Proceedings of the 14th IEEE International Conference Tools with Artificial Intelligence; 2002. p. 399–406.

    Google Scholar 

  2. Crescenzi V, Mecca G. Automatic information extraction from large websites. J ACM. 2004;51(5):731–79.

    Article  MathSciNet  MATH  Google Scholar 

  3. Hsu C-N, Dung M-T. Generating finite-state transducers for semi-structured data extraction from the web. J Inf Syst. 1998;23(8):521–38.

    Article  Google Scholar 

  4. Kushmerick N. Regression testing for wrapper maintenance. In: Proceedings of the 14th National Conference on AI; 1999. p. 74–9.

    Google Scholar 

  5. Kushmerick N. Wrapper verification. World Wide Web J. 2000;3(2):79–94.

    Article  MATH  Google Scholar 

  6. Kushmerick N, Weld DS, Doorenbos RB. Wrapper induction for information extraction. In: Proceedings of the 15th International Joint Conference on AI; 1997. p. 729–37.

    Google Scholar 

  7. Lerman K, Gazen C, Minton S, Knoblock CA. Populating the semantic web. In: Proceedings of the AAAI Workshop on Advances in Text Extraction and Mining; 2004.

    Google Scholar 

  8. Lerman K, Minton S. Learning the common structure of data. In: Proceedings of the 12th National Conference on AI; 2000. p. 609–14.

    Google Scholar 

  9. Lerman K, Minton S, Knoblock C. Wrapper maintenance: a machine learning approach. J Artif Intell Res. 2003;18:149–81.

    Article  MATH  Google Scholar 

  10. Meng X, Hu D, Li C. Schema-guided wrapper maintenance. In: Proceedings of the 2003 Conference on Web Information and Data Management; 2003. p. 1–8.

    Google Scholar 

  11. Muslea I, Minton S, Knoblock CA. Hierarchical wrapper induction for semistructured information sources. Auton Agent Multi Agent Syst. 2001;4(1–2):93–114.

    Article  Google Scholar 

  12. Raposo J, Pan A, Alvarez M, Hidalgo J. Automatically generating labeled examples for web wrapper maintenance. In: Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence; 2005. p. 250–6.

    Google Scholar 

  13. Raposo J, Pan A, Álvarez M, Hidalgo J. Automatically maintaining wrappers for semi-structured web sources. Data Knowl Eng. 2007;61(2):331–58.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kristina Lerman .

Editor information

Editors and Affiliations

Section Editor information

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Lerman, K., Knoblock, C.A. (2018). Wrapper Maintenance. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_1158

Download citation

Publish with us

Policies and ethics