Information Extraction for the Semantic Web

Baumgartner, Robert; Eiter, Thomas; Gottlob, Georg; Herzog, Marcus; Koch, Christoph

doi:10.1007/11526988_8

Robert Baumgartner¹⁸,
Thomas Eiter¹⁹,
Georg Gottlob¹⁸,
Marcus Herzog¹⁸ &
…
Christoph Koch¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3564))

763 Accesses
4 Citations

Abstract

The World Wide Web represents a universe of knowledge and information. Unfortunately, it is not straightforward to query and access the desired information. Languages and tools for accessing, extracting, transforming, and syndicating the desired information are required. The Web should be useful not merely for human consumption but additionally for machine communication. Therefore, powerful and user-friendly tools based on expressive languages for extracting and integrating information from various different Web sources, or in general, various heterogeneous sources are needed. The tutorial gives an introduction to Web technologies required in this context, and presents various approaches and techniques used in information extraction and integration. Moreover, sample applications in various domains motivate the discussed topics and providing data instances for the Semantic Web is illustrated.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 74.95; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Information Extraction Approaches: A Survey

An Approach to Web Information Processing

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

References

Adelberg, B.: NoDoSE - a tool for semi-automatically extracting semi-structured data from text documents. In: Proc. of SIGMOD (1998)
Google Scholar
P. Atzeni and G. Mecca. Cut and paste. In Proc. of PODS (1997)
Google Scholar
Baumgartner, R., Flesca, S., Gottlob, G.: Visual web information extraction with Lixto. In: Proc. of VLDB (2001)
Google Scholar
Baumgartner, R., Henze, N., Herzog, M.: The personal publication reader: Illustrating web data extraction, personalization and reasoning. In: Proc. of ESWC (2005)
Google Scholar
Baumgartner, R., Herzog, M., Gottlob, G.: Visual programming of web data aggregation applications. In: Proc. of IIWeb 2003 (2003)
Google Scholar
Bergman, M.K.: The deep web: Surfacing hidden value. BrightPlanet White Paper, http://www.brightplanet.com/technology/deepweb.asp
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (May 2001)
Google Scholar
Cabeza, D., Hermenegildo, M.: Distributed WWW programming using (Ciao- )Prolog and the PiLLoW library. TPLP 1(3) (2001)
Google Scholar
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proceedings of 27th International Conference on Very Large Data Bases, pp. 109–118 (2001)
Google Scholar
Davulcu, H., Yang, G., Kifer, M., Ramakrishnan, I.: Computat. aspects of resilient data extract. from semistr. sources. In: Proc. of PODS (2000)
Google Scholar
Dolog, P., Henze, N., Nejdl, W., Sintek, M.: The Personal Reader: Personalizing and Enriching Learning Resources using Semantic Web Technologies. In: Proccedings of the 3nd International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems (AH 2004), Eindhoven, The Netherlands (2004)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in Know It All (Preliminary Results). In: Proceedings of the World Wide Web Conference (2004)
Google Scholar
Flesca, S., Manco, G., Masciari, E., Rende, E., Tagarelli, A.: Web wrapper induction: a brief survey. AI Communications 17(2) (2004)
Google Scholar
Gottlob, G., Koch, C.: Monadic datalog and the expressive power of languages for Web Information Extraction. In: Proc. of PODS (2002)
Google Scholar
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting semistructured information from the web. In: Proc. Workshop on Mang. of Semistructured Data (1997)
Google Scholar
Hsu, C.-N., Dung, M.: Generating finite-state transducers for semistructured data extraction from the web. Information Systems 23(8) (1998)
Google Scholar
Huck, G., Fankhauser, P., Aberer, K., Neuhold, E.: JEDI: Extracting and synthesizing information from the web. In: Proc. of COOPIS (1998)
Google Scholar
Kahaner, L.: Competitive Intelligence: How to Gather, Analyse Information to Move your Business to the Top. Touchstone Press (1998)
Google Scholar
Kapowtech. RoboSuite (2003), Published on http://www.kapowtech.com
Kuhlins, S., Tredwell, R.: Toolkits for generating wrappers. Net.Object Days (2002)
Google Scholar
Kushmerick, N.: Wrapper verification. World Wide Web Journal (2000)
Google Scholar
Kushmerick, N., Weld, D., Doorenbos, R.: Wrapper induction for information extraction. In: Proc. of IJCAI (1997)
Google Scholar
Laender, A.H., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. Sigmod Record 31(2) (2002)
Google Scholar
Liu, L., Pu, C., Han, W.: XWrap: An extensible wrapper construction system for internet information. In: Proc. of ICDE (2000)
Google Scholar
Liu, Z., Li, F., Ng, W.K.: Wiccap Data Model: Mapping Physical Websites to Logical Views. In: Proceedings of the 21st International Conference on Conceptual Modelling (ER 2002), October 7-11, Tempere, Finland (2002)
Google Scholar
May, W., Himmeröder, R., Lausen, G., Ludäscher, B.: A unified framework for wrapping, mediating and restructuring information from the web. In: ER Workshops 1999. LNCS, vol. 1727. Springer, Heidelberg (1999)
Google Scholar
Meng, X., Wang, H., Li, C., Kou, H.: A schema-guided toolkit for generating wrappers. In: Proc. of WEBSA 2003 (2003)
Google Scholar
Miller, R.C., Myers, B.A.: LAPIS: Smart Editing with Text Structure. In: Proceedings of the CHI 2002 Conference on Human Factors in Computing Systems, Minneapolis, Minnesota, USA, Apr. 2002, pp. 496–497. ACM Press, New York (2002)
Google Scholar
Muslea, I.: RISE: Repository of Online Information Sources Used in Information Extraction Tasks (1998), Published on http://www.isi.edu/info-agents/RISE/
Muslea, I., Minton, S., Knoblock, C.: A hierarchical approach to wrapper induction. In: Proc. of 3rd Intern. Conf. on Autonomous Agents (1999)
Google Scholar
Raposo, J., Pan, A., Alvarez, M., Hidalgo, J., Vina, A.: The Wargo System: Semi-Automatic Wrapper Generation in Presence of Complex Data Access Modes. In: Proceedings of DEXA 2002, Aix-en-Provence, France (2002)
Google Scholar
Ribeiro-Neto, B., Laender, A.H.F., da Silva, A.S.: Extracting semi-structured data through examples. In: Proc. of CIKM (1999)
Google Scholar
Sahuguet, A., Azavant, F.: Building light-weight wrappers for legacy web datasources using W4F. In: Proc. of VLDB (1999)
Google Scholar
Thomas, B.: Anti-unification based learning of T-wrappers for information extraction. In: Workshop on Machine Learning for IE (1999)
Google Scholar
Tiemeyer, E., Zsifkovitis, H.E.: Information als Führungsmittel: Executive Information Systems, 1st edn. Konzeption, Technologie, Produkte, Einführung (1995)
Google Scholar
Tredwell, R., Kuhlins, S.: Wrapper Generating Tools (2003), Published on http://www.wifo.uni-mannheim.de/_kuhlins/wrappertools/

Download references

Author information

Authors and Affiliations

Database and Artificial Intelligence Group, Institute of Information Systems, Vienna University of Technology, Favoritenstrasse 9-11, 1040, Vienna, Austria
Robert Baumgartner, Georg Gottlob, Marcus Herzog & Christoph Koch
Knowledge-Based Systems Group, Institute of Information Systems, Vienna University of Technology, Favoritenstrasse 9-11, 1040, Vienna, Austria
Thomas Eiter

Authors

Robert Baumgartner
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Eiter
View author publications
You can also search for this author in PubMed Google Scholar
Georg Gottlob
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Herzog
View author publications
You can also search for this author in PubMed Google Scholar
Christoph Koch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Informatics, University of Munich, Oettingenstraße 67, D-80538, München, Germany
Norbert Eisinger
Department of Computer and Information Science, Linköping University, 581 83, Linköping, Sweden
Jan Małuszyński

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Baumgartner, R., Eiter, T., Gottlob, G., Herzog, M., Koch, C. (2005). Information Extraction for the Semantic Web. In: Eisinger, N., Małuszyński, J. (eds) Reasoning Web. Lecture Notes in Computer Science, vol 3564. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11526988_8

Download citation

DOI: https://doi.org/10.1007/11526988_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27828-3
Online ISBN: 978-3-540-31675-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Information Extraction for the Semantic Web

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction Approaches: A Survey

An Approach to Web Information Processing

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Information Extraction for the Semantic Web

Abstract

Access this chapter

Preview

Similar content being viewed by others

Information Extraction Approaches: A Survey

An Approach to Web Information Processing

WebOMSIE: An Ontology-Based Multi Source Web Information Extraction

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation