Wrapper adaptability; Wrapper robustness
A wrapper, whether handwritten or generated by a Web Data Extraction System, is a program that extracts data from information sources of changing content and translates the data into a different format. The stability of a wrapper is the degree of insensitivity to changes of the presentation (i.e., formatting, layout, or syntax) of the data sources. A stable wrapper is ideally able to extract the desired data from source documents even if the layout of the current documents differs from the layout of those documents that were used as examples at the time the wrapper was generated. Thus, a wrapper is stable or robust if it is able of coping with perturbations of the original layout.
Documents and Web pages in particular are usually susceptible to slight layout changes over time. Most of these changes are minor changes. For example, an advertisement may be added to the top of a Web page. One would ideally wish that...
- 1.Baumgartner R, Flesca S, Gottlob G. Visual web information extraction with Lixto. In: Proceedings of the 27th International Conference on Very Large Data Bases; 2001. p. 119–28.Google Scholar
- 2.Davulcu H, Yang G, Kifer M, Ramakrishnan IV. Computational aspects of resilient data extraction from semistructured sources. In: Proceedings of the 19th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems; 2000.p. 136–44.Google Scholar
- 3.Meng X, Hu D, Li C. Schema-guided wrapper maintenance for web-data extraction. In: Proceedings of the 5th ACM CIKM International Workshop on Web Information and Data Management; 2003.p. 1–8.Google Scholar
- 5.Wong T, Lam W. A probabilistic approach for adapting information extraction wrappers and discovering new attributes. In: Proceedings of the 2004 IEEE International Conference on Data Mining; 2004.p. 257–64.Google Scholar