Skip to main content

Information Extraction for the Semantic Web

  • Chapter
Reasoning Web

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3564))

Abstract

The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Adelberg, B.: NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents. In: Proc. of SIGMOD (1998)

    Google Scholar 

  2. P. Atzeni and G. Mecca. Cut and paste. In Proc. of PODS (1997)

    Google Scholar 

  3. Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB (2001)

    Google Scholar 

  4. Baumgartner, R., Henze, N., Herzog, M.: The personal publication reader: Illustrating web data extraction, personalization and reasoning. In: Proc. of ESWC (2005)

    Google Scholar 

  5. Baumgartner, R., Herzog, M., Gottlob, G.: Visual programming of web data aggregation applications. In: Proc. of IIWeb 2003 (2003)

    Google Scholar 

  6. Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp

  7. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)

    Google Scholar 

  8. Cabeza, D., Hermenegildo, M.: Distributed WWW programming using (Ciao- )Prolog and the PiLLoW library. TPLP 1(3) (2001)

    Google Scholar 

  9. Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)

    Google Scholar 

  10. Davulcu, H., Yang, G., Kifer, M., Ramakrishnan, I.: Computat. aspects of resilient data extract. from semistr. sources. In: Proc. of PODS (2000)

    Google Scholar 

  11. Dolog, P., Henze, N., Nejdl, W., Sintek, M.: The Personal Reader: Personalizing and Enriching Learning Resources using Semantic Web Technologies. In: Proccedings of the 3nd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2004), Eindhoven, The Netherlands (2004)

    Google Scholar 

  12. Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in Know It All (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)

    Google Scholar 

  13. Flesca, S., Manco, G., Masciari, E., Rende, E., Tagarelli, A.: Web wrapper induction: a brief survey. AI Communications 17(2) (2004)

    Google Scholar 

  14. Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002)

    Google Scholar 

  15. Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)

    Google Scholar 

  16. Hsu, C.-N., Dung, M.: Generating finite-state transducers for semistructured data extraction from the web. Information Systems 23(8) (1998)

    Google Scholar 

  17. Huck, G., Fankhauser, P., Aberer, K., Neuhold, E.: JEDI: Extracting and synthesizing information from the web. In: Proc. of COOPIS (1998)

    Google Scholar 

  18. Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone Press (1998)

    Google Scholar 

  19. Kapowtech. RoboSuite (2003), Published on http://www.kapowtech.com

  20. Kuhlins, S., Tredwell, R.: Toolkits for generating wrappers. Net.Object Days (2002)

    Google Scholar 

  21. Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)

    Google Scholar 

  22. Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)

    Google Scholar 

  23. Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2) (2002)

    Google Scholar 

  24. Liu, L., Pu, C., Han, W.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)

    Google Scholar 

  25. Liu, Z., Li, F., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proceedings of the 21st International Conference on Conceptual Modelling (ER 2002), October 7-11, Tempere, Finland (2002)

    Google Scholar 

  26. May, W., Himmeröder, R., Lausen, G., Ludäscher, B.: A unified framework for wrapping, mediating and restructuring information from the web. In: ER Workshops 1999. LNCS, vol. 1727. Springer, Heidelberg (1999)

    Google Scholar 

  27. Meng, X., Wang, H., Li, C., Kou, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)

    Google Scholar 

  28. Miller, R.C., Myers, B.A.: LAPIS: Smart Editing with Text Structure. In: Proceedings of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, Minnesota, USA, Apr. 2002, pp. 496–497. ACM Press, New York (2002)

    Google Scholar 

  29. Muslea, I.: RISE: Repository of Online Information Sources Used in Information Extraction Tasks (1998), Published on http://www.isi.edu/info-agents/RISE/

  30. Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)

    Google Scholar 

  31. Raposo, J., Pan, A., Alvarez, M., Hidalgo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)

    Google Scholar 

  32. Ribeiro-Neto, B., Laender, A.H.F., da Silva, A.S.: Extracting semi-structured data through examples. In: Proc. of CIKM (1999)

    Google Scholar 

  33. Sahuguet, A., Azavant, F.: Building light-weight wrappers for legacy web datasources using W4F. In: Proc. of VLDB (1999)

    Google Scholar 

  34. Thomas, B.: Anti-unification based learning of T-wrappers for information extraction. In: Workshop on Machine Learning for IE (1999)

    Google Scholar 

  35. Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, 1st edn. Konzeption, Technologie, Produkte, Einführung (1995)

    Google Scholar 

  36. Tredwell, R., Kuhlins, S.: Wrapper Generating Tools (2003), Published on http://www.wifo.uni-mannheim.de/_kuhlins/wrappertools/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C. (2005). Information Extraction for the Semantic Web. In: Eisinger, N., Małuszyński, J. (eds) Reasoning Web. Lecture Notes in Computer Science, vol 3564. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526988_8

Download citation

  • DOI: https://doi.org/10.1007/11526988_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27828-3

  • Online ISBN: 978-3-540-31675-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics