Skip to main content

Intelligent Self-repairable Web Wrappers

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6934))

Abstract

The amount of information available on the Web grows at an incredible high rate. Systems and procedures devised to extract these data from Web sources already exist, and different approaches and techniques have been investigated during the last years. On the one hand, reliable solutions should provide robust algorithms of Web data mining which could automatically face possible malfunctioning or failures. On the other, in literature there is a lack of solutions about the maintenance of these systems. Procedures that extract Web data may be strictly interconnected with the structure of the data source itself; thus, malfunctioning or acquisition of corrupted data could be caused, for example, by structural modifications of data sources brought by their owners. Nowadays, verification of data integrity and maintenance are mostly manually managed, in order to ensure that these systems work correctly and reliably. In this paper we propose a novel approach to create procedures able to extract data from Web sources – the so called Web wrappers – which can face possible malfunctioning caused by modifications of the structure of the data source, and can automatically repair themselves.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baumgartner, R., Gatterbauer, W., Gottlob, G.: Web data extraction system. Encyclopedia of Database Systems, 3465–3471 (2009)

    Google Scholar 

  2. Baumgartner, R., Gottlob, G., Herzog, M.: Scalable web data extraction for online market intelligence. Proceedings of the VLDB Endowment 2(2), 1512–1523 (2009)

    Article  Google Scholar 

  3. Bille, P.: A survey on tree edit distance and related problems. Theoretical Computer Science 337(1-3), 217–239 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chidlovskii, B.: Automatic repairing of web wrappers by combining redundant views. In: Proceedings of the 14th International Conference on Tools with Artificial Intelligence, pp. 399–406. IEEE, Los Alamitos (2003)

    Google Scholar 

  5. Esposito, F., Malerba, D., Di Pace, L., Leo, P.: A machine learning approach to web mining. In: AI* IA 1999: Advances in Artificial Intelligence, pp. 190–201 (2000)

    Google Scholar 

  6. Ferrara, E., Baumgartner, R.: Automatic wrapper adaptation by tree edit distance matching. In: Hatzilygeroudis, I., Prentzas, J. (eds.) Combinations of Intelligent Methods and Applications. Smart Innovation, Systems and Technologies, vol. 8, pp. 41–54. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  7. Ferrara, E., Baumgartner, R.: Design of automatically adaptable web wrappers. In: Proceedings of the 3rd International Conference on Agents and Artificial Intelligence, pp. 211–217 (2011)

    Google Scholar 

  8. Ferrara, E., Fiumara, G., Baumgartner, R.: Web data extraction, application and techniques: A survey. Technical Report (2011)

    Google Scholar 

  9. Kim, Y., Park, J., Kim, T., Choi, J.: Web information extraction by HTML tree edit distance matching. In: Proceedings of the International Conference on Convergence Information Technology, pp. 2455–2460. IEEE, Los Alamitos (2008)

    Google Scholar 

  10. Kushmerick, N.: Wrapper verification. World Wide Web 3(2), 79–94 (2000)

    Article  MATH  Google Scholar 

  11. Kushmerick, N.: Finite-state approaches to Web information extraction. Extraction in the Web Era, 77–91 (2003)

    Google Scholar 

  12. Kushmerick, N., et al.: Regression testing for wrapper maintenance. In: Proceedings of the National Conference on Artificial Intelligence, pp. 74–284 (1999)

    Google Scholar 

  13. Laender, A., Ribeiro-Neto, B., da Silva, A., Teixeira, J.: A brief survey of web data extraction tools. ACM Sigmod Record 31(2), 84–93 (2002)

    Article  Google Scholar 

  14. Lerman, K., Minton, S., Knoblock, C.: Wrapper maintenance: A machine learning approach. Journal of Artificial Intelligence Research 18(1), 149–181 (2003)

    MATH  Google Scholar 

  15. Meng, X., Hu, D., Li, C.: Schema-guided wrapper maintenance for web-data extraction. In: Proceedings of the 5th ACM International Workshop on Web Information and Data Management, pp. 1–8. ACM, New York (2003)

    Google Scholar 

  16. Raposo, J., Pan, A., Alvarez, M., Hidalgo, J.: Automatically generating labeled examples for web wrapper maintenance. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, pp. 250–256 (2005)

    Google Scholar 

  17. Sarawagi, S.: Information extraction. Foundations and Trends in Databases 1(3), 261–377 (2008)

    Article  MATH  Google Scholar 

  18. Selkow, S.: The tree-to-tree editing problem. Information Processing Letters 6(6), 184–186 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  19. Yang, W.: Identifying syntactic differences between two programs. Software: Practice and Experience 21(7), 739–755 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ferrara, E., Baumgartner, R. (2011). Intelligent Self-repairable Web Wrappers. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-23954-0_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-23953-3

  • Online ISBN: 978-3-642-23954-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics