iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text

Adrian, Benjamin; Hees, Jörn; van Elst, Ludger; Dengel, Andreas

doi:10.1007/978-3-642-04617-9_32

Benjamin Adrian²⁰,
Jörn Hees²¹,
Ludger van Elst²⁰ &
…
Andreas Dengel^20,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5803))

Included in the following conference series:

Annual Conference on Artificial Intelligence

1699 Accesses
14 Citations

Abstract

Due to the huge amount of text data in the WWW, annotating unstructured text with semantic markup is a crucial topic in Semantic Web research. This work formally analyzes the incorporation of domain ontologies into information extraction tasks in iDocument. Ontology-based information extraction exploits domain ontologies with formalized and structured domain knowledge for extracting domain-relevant information from un-annotated and unstructured text. iDocument provides a pipeline architecture, an extraction template interface and the ability of exchanging domain ontologies for performing information extraction tasks. This work outlines iDocument’s ontology-based architecture, the use of SPARQL queries as extraction templates and an evaluation of iDocument in an automatic document annotation scenario.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Atzmüller, M., Klügl, P., Puppe, F.: Rule-Based Information Extraction for Structured Data Acquisition using TextMarker. In: Proc. LWA 2008 (Special Track on Knowledge Discovery and Machine Learning) (2008)
Google Scholar
Ireson, N., Ciravegna, F., Califf, M.E., Freitag, D., Kushmerick, N., Lavelli, A.: Evaluating Machine Learning for Information Extraction. In: Raedt, L.D., Wrobel, S. (eds.) ICML. ACM Int. Conf. Proc. Series, vol. 119, pp. 345–352. ACM, New York (2005)
Google Scholar
Brickley, D., Guha, R.V.: RDF Vocabulary Description Language 1.0: RDF Schema. W3C recommendation, World Wide Web Consortium (2004)
Google Scholar
Bontcheva, K., Tablan, V., Maynard, D., Cunningham, H.: Evolving GATE to meet new challenges in language engineering. JNLE 10(3-4), 349–373 (2004)
Google Scholar
Buitelaar, P., Cimiano, P., Frank, A., Hartung, M., Racioppa, S.: Ontology-based Information Extraction and Integration from Heterogeneous Data Sources. Int. Journal of Human-Computer Studies (11), 759–788 (2008)
Google Scholar
Endres-Niggemeyer, B., Jauris-Heipke, S., Pinsky, M., Ulbricht, U.: Wissen gewinnen durch Wissen: Ontologiebasierte Informationsextraktion. Information - Wissenschaft & Praxis 57(1), 301–308 (2006)
Google Scholar
Embley, D.W., Campbell, D.M., Smith, R.D., Liddle, S.W.: Ontology-based Extraction and Structuring of Information from Data-Rich Unstructured Documents. In: CIKM 1998: Proc. of the 7th Int. Conf. on Information and Knowledge Management, pp. 52–59. ACM, New York (1998)
Google Scholar
Sintek, M., Junker, M., van Elst, L., Abecker, A.: Using Information Extraction Rules for Extending Domain Ontologies. In: Workshop on Ontology Learning. CEUR-WS.org (2001)
Google Scholar
Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology-based Information Extraction System. In: Szczepaniak, P., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds.) Intelligent Exploration of the Web. Springer, Berlin (2002)
Google Scholar
Grishman, R., Sundheim, B.: Design of the MUC-6 evaluation. In: Proc. of a workshop held at Vienna, pp. 413–422. Association for Computational Linguistics, Virginia (1996)
Google Scholar
Hobbs, J., Israel, D.: Principles of Template Design. In: HLT 1994: Proc. of the workshop on HLT, pp. 177–181. ACL, Morristown (1994)
Google Scholar
Labský, M., Svátek, V., Nekvasil, M., Rak, D.: The Ex Project: Web Information Extraction using Extraction Ontologies. In: Proc. Workshop on Prior Conceptual Knowledge in Machine Learning and Knowledge Discovery, PriCKL 2007 (2007)
Google Scholar
Sauermann, L., van Elst, L., Dengel, A.: PIMO - a Framework for Representing Personal Information Models. In: Proc. of I-Semantics 2007, JUCS, pp. 270–277 (2007)
Google Scholar
Adrian, B., Dengel, A.: Believing Finite-State cascades in Knowledge-based Information Extraction. In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R. (eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 152–159. Springer, Heidelberg (2008)
Chapter Google Scholar
Grothkast, A., Adrian, B., Schumacher, K., Dengel, A.: OCAS: Ontology-Based Corpus and Annotation Scheme. In: Proc. of the HLIE Workshop 2008, ECML PKDD, pp. 25–35 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Knowledge Management Department, DFKI, Kaiserslautern, Germany
Benjamin Adrian, Ludger van Elst & Andreas Dengel
CS Department, University of Kaiserslautern, Kaiserslautern, Germany
Jörn Hees & Andreas Dengel

Authors

Benjamin Adrian
View author publications
You can also search for this author in PubMed Google Scholar
Jörn Hees
View author publications
You can also search for this author in PubMed Google Scholar
Ludger van Elst
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Dengel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GET Lab, University of Paderborn, Pohlweg 47-49, 33098, Paderborn, Germany
Bärbel Mertsching , Marcus Hund & Zaheer Aziz , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Adrian, B., Hees, J., van Elst, L., Dengel, A. (2009). iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text. In: Mertsching, B., Hund, M., Aziz, Z. (eds) KI 2009: Advances in Artificial Intelligence. KI 2009. Lecture Notes in Computer Science(), vol 5803. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04617-9_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-04617-9_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04616-2
Online ISBN: 978-3-642-04617-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics